{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Holiday Package Prediciton\n",
    "\n",
    "### 1) Problem statement.\n",
    "\"Trips & Travel.Com\" company wants to enable and establish a viable business model to expand the customer base.\n",
    "One of the ways to expand the customer base is to introduce a new offering of packages. Currently, there are 5 types of packages the company is offering * Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that 18% of the customers purchased the packages. However, the marketing cost was quite high because customers were contacted at random without looking at the available information.\n",
    "The company is now planning to launch a new product i.e. Wellness Tourism Package. Wellness Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy lifestyle, and support or increase one's sense of well-being.\n",
    "However, this time company wants to harness the available data of existing and potential customers to make the marketing expenditure more efficient.\n",
    "### 2) Data Collection.\n",
    "The Dataset is collected from https://www.kaggle.com/datasets/susant4learning/holiday-package-purchase-prediction\n",
    "The data consists of 20 column and 4888 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "## importing important libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import plotly.express as px\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>ProdTaken</th>\n",
       "      <th>Age</th>\n",
       "      <th>TypeofContact</th>\n",
       "      <th>CityTier</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Gender</th>\n",
       "      <th>NumberOfPersonVisiting</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>ProductPitched</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>MaritalStatus</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>Passport</th>\n",
       "      <th>PitchSatisfactionScore</th>\n",
       "      <th>OwnCar</th>\n",
       "      <th>NumberOfChildrenVisiting</th>\n",
       "      <th>Designation</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>200000</td>\n",
       "      <td>1</td>\n",
       "      <td>41.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Single</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20993.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>200001</td>\n",
       "      <td>0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Male</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20130.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>200002</td>\n",
       "      <td>1</td>\n",
       "      <td>37.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Free Lancer</td>\n",
       "      <td>Male</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Single</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17090.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>200003</td>\n",
       "      <td>0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17909.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>200004</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Small Business</td>\n",
       "      <td>Male</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>18468.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   CustomerID  ProdTaken   Age    TypeofContact  CityTier  DurationOfPitch  \\\n",
       "0      200000          1  41.0     Self Enquiry         3              6.0   \n",
       "1      200001          0  49.0  Company Invited         1             14.0   \n",
       "2      200002          1  37.0     Self Enquiry         1              8.0   \n",
       "3      200003          0  33.0  Company Invited         1              9.0   \n",
       "4      200004          0   NaN     Self Enquiry         1              8.0   \n",
       "\n",
       "       Occupation  Gender  NumberOfPersonVisiting  NumberOfFollowups  \\\n",
       "0        Salaried  Female                       3                3.0   \n",
       "1        Salaried    Male                       3                4.0   \n",
       "2     Free Lancer    Male                       3                4.0   \n",
       "3        Salaried  Female                       2                3.0   \n",
       "4  Small Business    Male                       2                3.0   \n",
       "\n",
       "  ProductPitched  PreferredPropertyStar MaritalStatus  NumberOfTrips  \\\n",
       "0         Deluxe                    3.0        Single            1.0   \n",
       "1         Deluxe                    4.0      Divorced            2.0   \n",
       "2          Basic                    3.0        Single            7.0   \n",
       "3          Basic                    3.0      Divorced            2.0   \n",
       "4          Basic                    4.0      Divorced            1.0   \n",
       "\n",
       "   Passport  PitchSatisfactionScore  OwnCar  NumberOfChildrenVisiting  \\\n",
       "0         1                       2       1                       0.0   \n",
       "1         0                       3       1                       2.0   \n",
       "2         1                       3       0                       0.0   \n",
       "3         1                       5       1                       1.0   \n",
       "4         0                       5       1                       0.0   \n",
       "\n",
       "  Designation  MonthlyIncome  \n",
       "0     Manager        20993.0  \n",
       "1     Manager        20130.0  \n",
       "2   Executive        17090.0  \n",
       "3   Executive        17909.0  \n",
       "4   Executive        18468.0  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"Travel.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Cleaning\n",
    "### Handling Missing values\n",
    "1. Handling Missing values\n",
    "2. Handling Duplicates\n",
    "3. Check data type\n",
    "4. Understand the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CustomerID                    0\n",
       "ProdTaken                     0\n",
       "Age                         226\n",
       "TypeofContact                25\n",
       "CityTier                      0\n",
       "DurationOfPitch             251\n",
       "Occupation                    0\n",
       "Gender                        0\n",
       "NumberOfPersonVisiting        0\n",
       "NumberOfFollowups            45\n",
       "ProductPitched                0\n",
       "PreferredPropertyStar        26\n",
       "MaritalStatus                 0\n",
       "NumberOfTrips               140\n",
       "Passport                      0\n",
       "PitchSatisfactionScore        0\n",
       "OwnCar                        0\n",
       "NumberOfChildrenVisiting     66\n",
       "Designation                   0\n",
       "MonthlyIncome               233\n",
       "dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Male       2916\n",
       "Female     1817\n",
       "Fe Male     155\n",
       "Name: Gender, dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### Check all the categories \n",
    "df['Gender'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Married      2340\n",
       "Divorced      950\n",
       "Single        916\n",
       "Unmarried     682\n",
       "Name: MaritalStatus, dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['MaritalStatus'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Self Enquiry       3444\n",
       "Company Invited    1419\n",
       "Name: TypeofContact, dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['TypeofContact'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Gender'] = df['Gender'].replace('Fe Male', 'Female')\n",
    "df['MaritalStatus'] = df['MaritalStatus'].replace('Single', 'Unmarried')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Male      2916\n",
       "Female    1972\n",
       "Name: Gender, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### Check all the categories \n",
    "df['Gender'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>ProdTaken</th>\n",
       "      <th>Age</th>\n",
       "      <th>TypeofContact</th>\n",
       "      <th>CityTier</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Gender</th>\n",
       "      <th>NumberOfPersonVisiting</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>ProductPitched</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>MaritalStatus</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>Passport</th>\n",
       "      <th>PitchSatisfactionScore</th>\n",
       "      <th>OwnCar</th>\n",
       "      <th>NumberOfChildrenVisiting</th>\n",
       "      <th>Designation</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>200000</td>\n",
       "      <td>1</td>\n",
       "      <td>41.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20993.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>200001</td>\n",
       "      <td>0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Male</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20130.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>200002</td>\n",
       "      <td>1</td>\n",
       "      <td>37.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Free Lancer</td>\n",
       "      <td>Male</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17090.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>200003</td>\n",
       "      <td>0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17909.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>200004</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Small Business</td>\n",
       "      <td>Male</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>18468.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   CustomerID  ProdTaken   Age    TypeofContact  CityTier  DurationOfPitch  \\\n",
       "0      200000          1  41.0     Self Enquiry         3              6.0   \n",
       "1      200001          0  49.0  Company Invited         1             14.0   \n",
       "2      200002          1  37.0     Self Enquiry         1              8.0   \n",
       "3      200003          0  33.0  Company Invited         1              9.0   \n",
       "4      200004          0   NaN     Self Enquiry         1              8.0   \n",
       "\n",
       "       Occupation  Gender  NumberOfPersonVisiting  NumberOfFollowups  \\\n",
       "0        Salaried  Female                       3                3.0   \n",
       "1        Salaried    Male                       3                4.0   \n",
       "2     Free Lancer    Male                       3                4.0   \n",
       "3        Salaried  Female                       2                3.0   \n",
       "4  Small Business    Male                       2                3.0   \n",
       "\n",
       "  ProductPitched  PreferredPropertyStar MaritalStatus  NumberOfTrips  \\\n",
       "0         Deluxe                    3.0     Unmarried            1.0   \n",
       "1         Deluxe                    4.0      Divorced            2.0   \n",
       "2          Basic                    3.0     Unmarried            7.0   \n",
       "3          Basic                    3.0      Divorced            2.0   \n",
       "4          Basic                    4.0      Divorced            1.0   \n",
       "\n",
       "   Passport  PitchSatisfactionScore  OwnCar  NumberOfChildrenVisiting  \\\n",
       "0         1                       2       1                       0.0   \n",
       "1         0                       3       1                       2.0   \n",
       "2         1                       3       0                       0.0   \n",
       "3         1                       5       1                       1.0   \n",
       "4         0                       5       1                       0.0   \n",
       "\n",
       "  Designation  MonthlyIncome  \n",
       "0     Manager        20993.0  \n",
       "1     Manager        20130.0  \n",
       "2   Executive        17090.0  \n",
       "3   Executive        17909.0  \n",
       "4   Executive        18468.0  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Age 4.62357 % missing values\n",
      "TypeofContact 0.51146 % missing values\n",
      "DurationOfPitch 5.13502 % missing values\n",
      "NumberOfFollowups 0.92062 % missing values\n",
      "PreferredPropertyStar 0.53191 % missing values\n",
      "NumberOfTrips 2.86416 % missing values\n",
      "NumberOfChildrenVisiting 1.35025 % missing values\n",
      "MonthlyIncome 4.76678 % missing values\n"
     ]
    }
   ],
   "source": [
    "## Check Misssing Values\n",
    "##these are the features with nan value\n",
    "features_with_na=[features for features in df.columns if df[features].isnull().sum()>=1]\n",
    "for feature in features_with_na:\n",
    "    print(feature,np.round(df[feature].isnull().mean()*100,5), '% missing values')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>NumberOfChildrenVisiting</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>4662.000000</td>\n",
       "      <td>4637.000000</td>\n",
       "      <td>4843.000000</td>\n",
       "      <td>4862.000000</td>\n",
       "      <td>4748.000000</td>\n",
       "      <td>4822.000000</td>\n",
       "      <td>4655.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>37.622265</td>\n",
       "      <td>15.490835</td>\n",
       "      <td>3.708445</td>\n",
       "      <td>3.581037</td>\n",
       "      <td>3.236521</td>\n",
       "      <td>1.187267</td>\n",
       "      <td>23619.853491</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>9.316387</td>\n",
       "      <td>8.519643</td>\n",
       "      <td>1.002509</td>\n",
       "      <td>0.798009</td>\n",
       "      <td>1.849019</td>\n",
       "      <td>0.857861</td>\n",
       "      <td>5380.698361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>18.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>31.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>20346.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>36.000000</td>\n",
       "      <td>13.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>22347.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>44.000000</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>25571.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>61.000000</td>\n",
       "      <td>127.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>22.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>98678.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Age  DurationOfPitch  NumberOfFollowups  PreferredPropertyStar  \\\n",
       "count  4662.000000      4637.000000        4843.000000            4862.000000   \n",
       "mean     37.622265        15.490835           3.708445               3.581037   \n",
       "std       9.316387         8.519643           1.002509               0.798009   \n",
       "min      18.000000         5.000000           1.000000               3.000000   \n",
       "25%      31.000000         9.000000           3.000000               3.000000   \n",
       "50%      36.000000        13.000000           4.000000               3.000000   \n",
       "75%      44.000000        20.000000           4.000000               4.000000   \n",
       "max      61.000000       127.000000           6.000000               5.000000   \n",
       "\n",
       "       NumberOfTrips  NumberOfChildrenVisiting  MonthlyIncome  \n",
       "count    4748.000000               4822.000000    4655.000000  \n",
       "mean        3.236521                  1.187267   23619.853491  \n",
       "std         1.849019                  0.857861    5380.698361  \n",
       "min         1.000000                  0.000000    1000.000000  \n",
       "25%         2.000000                  1.000000   20346.000000  \n",
       "50%         3.000000                  1.000000   22347.000000  \n",
       "75%         4.000000                  2.000000   25571.000000  \n",
       "max        22.000000                  3.000000   98678.000000  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# statistics on numerical columns (Null cols)\n",
    "df[features_with_na].select_dtypes(exclude='object').describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imputing Null values\n",
    "1. Impute Median value for Age column\n",
    "2. Impute Mode for Type of Contract\n",
    "3. Impute Median for Duration of Pitch\n",
    "4. Impute Mode for NumberofFollowup as it is Discrete feature\n",
    "5. Impute Mode for PreferredPropertyStar\n",
    "6. Impute Median for NumberofTrips\n",
    "7. Impute Mode for NumberOfChildrenVisiting\n",
    "8. Impute Median for MonthlyIncome"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Age\n",
    "df.Age.fillna(df.Age.median(), inplace=True)\n",
    "\n",
    "#TypeofContract\n",
    "df.TypeofContact.fillna(df.TypeofContact.mode()[0], inplace=True)\n",
    "\n",
    "#DurationOfPitch\n",
    "df.DurationOfPitch.fillna(df.DurationOfPitch.median(), inplace=True)\n",
    "\n",
    "#NumberOfFollowups\n",
    "df.NumberOfFollowups.fillna(df.NumberOfFollowups.mode()[0], inplace=True)\n",
    "\n",
    "#PreferredPropertyStar\n",
    "df.PreferredPropertyStar.fillna(df.PreferredPropertyStar.mode()[0], inplace=True)\n",
    "\n",
    "#NumberOfTrips\n",
    "df.NumberOfTrips.fillna(df.NumberOfTrips.median(), inplace=True)\n",
    "\n",
    "#NumberOfChildrenVisiting\n",
    "df.NumberOfChildrenVisiting.fillna(df.NumberOfChildrenVisiting.mode()[0], inplace=True)\n",
    "\n",
    "#MonthlyIncome\n",
    "df.MonthlyIncome.fillna(df.MonthlyIncome.median(), inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CustomerID                  0\n",
       "ProdTaken                   0\n",
       "Age                         0\n",
       "TypeofContact               0\n",
       "CityTier                    0\n",
       "DurationOfPitch             0\n",
       "Occupation                  0\n",
       "Gender                      0\n",
       "NumberOfPersonVisiting      0\n",
       "NumberOfFollowups           0\n",
       "ProductPitched              0\n",
       "PreferredPropertyStar       0\n",
       "MaritalStatus               0\n",
       "NumberOfTrips               0\n",
       "Passport                    0\n",
       "PitchSatisfactionScore      0\n",
       "OwnCar                      0\n",
       "NumberOfChildrenVisiting    0\n",
       "Designation                 0\n",
       "MonthlyIncome               0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()\n",
    "df.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.drop('CustomerID', inplace=True, axis=1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature Engineering\n",
    "\n",
    "### Feature Extraction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ProdTaken</th>\n",
       "      <th>Age</th>\n",
       "      <th>TypeofContact</th>\n",
       "      <th>CityTier</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Gender</th>\n",
       "      <th>NumberOfPersonVisiting</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>ProductPitched</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>MaritalStatus</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>Passport</th>\n",
       "      <th>PitchSatisfactionScore</th>\n",
       "      <th>OwnCar</th>\n",
       "      <th>NumberOfChildrenVisiting</th>\n",
       "      <th>Designation</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>41.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20993.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Male</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20130.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>37.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Free Lancer</td>\n",
       "      <td>Male</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17090.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17909.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Small Business</td>\n",
       "      <td>Male</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>18468.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ProdTaken   Age    TypeofContact  CityTier  DurationOfPitch  \\\n",
       "0          1  41.0     Self Enquiry         3              6.0   \n",
       "1          0  49.0  Company Invited         1             14.0   \n",
       "2          1  37.0     Self Enquiry         1              8.0   \n",
       "3          0  33.0  Company Invited         1              9.0   \n",
       "4          0  36.0     Self Enquiry         1              8.0   \n",
       "\n",
       "       Occupation  Gender  NumberOfPersonVisiting  NumberOfFollowups  \\\n",
       "0        Salaried  Female                       3                3.0   \n",
       "1        Salaried    Male                       3                4.0   \n",
       "2     Free Lancer    Male                       3                4.0   \n",
       "3        Salaried  Female                       2                3.0   \n",
       "4  Small Business    Male                       2                3.0   \n",
       "\n",
       "  ProductPitched  PreferredPropertyStar MaritalStatus  NumberOfTrips  \\\n",
       "0         Deluxe                    3.0     Unmarried            1.0   \n",
       "1         Deluxe                    4.0      Divorced            2.0   \n",
       "2          Basic                    3.0     Unmarried            7.0   \n",
       "3          Basic                    3.0      Divorced            2.0   \n",
       "4          Basic                    4.0      Divorced            1.0   \n",
       "\n",
       "   Passport  PitchSatisfactionScore  OwnCar  NumberOfChildrenVisiting  \\\n",
       "0         1                       2       1                       0.0   \n",
       "1         0                       3       1                       2.0   \n",
       "2         1                       3       0                       0.0   \n",
       "3         1                       5       1                       1.0   \n",
       "4         0                       5       1                       0.0   \n",
       "\n",
       "  Designation  MonthlyIncome  \n",
       "0     Manager        20993.0  \n",
       "1     Manager        20130.0  \n",
       "2   Executive        17090.0  \n",
       "3   Executive        17909.0  \n",
       "4   Executive        18468.0  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create new column for feature\n",
    "df['TotalVisiting'] = df['NumberOfPersonVisiting'] + df['NumberOfChildrenVisiting']\n",
    "df.drop(columns=['NumberOfPersonVisiting', 'NumberOfChildrenVisiting'], axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num of Numerical Features : 12\n"
     ]
    }
   ],
   "source": [
    "## get all the numeric features\n",
    "num_features = [feature for feature in df.columns if df[feature].dtype != 'O']\n",
    "print('Num of Numerical Features :', len(num_features))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num of Categorical Features : 6\n"
     ]
    }
   ],
   "source": [
    "##categorical features\n",
    "cat_features = [feature for feature in df.columns if df[feature].dtype == 'O']\n",
    "print('Num of Categorical Features :', len(cat_features))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num of Discrete Features : 9\n"
     ]
    }
   ],
   "source": [
    "## Discrete features\n",
    "discrete_features=[feature for feature in num_features if len(df[feature].unique())<=25]\n",
    "print('Num of Discrete Features :',len(discrete_features))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num of Continuous Features : 3\n"
     ]
    }
   ],
   "source": [
    "## coontinuous features\n",
    "continuous_features=[feature for feature in num_features if feature not in discrete_features]\n",
    "print('Num of Continuous Features :',len(continuous_features))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ProdTaken</th>\n",
       "      <th>Age</th>\n",
       "      <th>TypeofContact</th>\n",
       "      <th>CityTier</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Gender</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>ProductPitched</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>MaritalStatus</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>Passport</th>\n",
       "      <th>PitchSatisfactionScore</th>\n",
       "      <th>OwnCar</th>\n",
       "      <th>Designation</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>TotalVisiting</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>41.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20993.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Male</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20130.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>37.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Free Lancer</td>\n",
       "      <td>Male</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17090.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17909.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Small Business</td>\n",
       "      <td>Male</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Executive</td>\n",
       "      <td>18468.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ProdTaken   Age    TypeofContact  CityTier  DurationOfPitch  \\\n",
       "0          1  41.0     Self Enquiry         3              6.0   \n",
       "1          0  49.0  Company Invited         1             14.0   \n",
       "2          1  37.0     Self Enquiry         1              8.0   \n",
       "3          0  33.0  Company Invited         1              9.0   \n",
       "4          0  36.0     Self Enquiry         1              8.0   \n",
       "\n",
       "       Occupation  Gender  NumberOfFollowups ProductPitched  \\\n",
       "0        Salaried  Female                3.0         Deluxe   \n",
       "1        Salaried    Male                4.0         Deluxe   \n",
       "2     Free Lancer    Male                4.0          Basic   \n",
       "3        Salaried  Female                3.0          Basic   \n",
       "4  Small Business    Male                3.0          Basic   \n",
       "\n",
       "   PreferredPropertyStar MaritalStatus  NumberOfTrips  Passport  \\\n",
       "0                    3.0     Unmarried            1.0         1   \n",
       "1                    4.0      Divorced            2.0         0   \n",
       "2                    3.0     Unmarried            7.0         1   \n",
       "3                    3.0      Divorced            2.0         1   \n",
       "4                    4.0      Divorced            1.0         0   \n",
       "\n",
       "   PitchSatisfactionScore  OwnCar Designation  MonthlyIncome  TotalVisiting  \n",
       "0                       2       1     Manager        20993.0            3.0  \n",
       "1                       3       1     Manager        20130.0            5.0  \n",
       "2                       3       0   Executive        17090.0            3.0  \n",
       "3                       5       1   Executive        17909.0            3.0  \n",
       "4                       5       1   Executive        18468.0            2.0  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train Test Split And Model Training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X = df.drop(['ProdTaken'], axis=1)\n",
    "y = df['ProdTaken']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>TypeofContact</th>\n",
       "      <th>CityTier</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Gender</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>ProductPitched</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>MaritalStatus</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>Passport</th>\n",
       "      <th>PitchSatisfactionScore</th>\n",
       "      <th>OwnCar</th>\n",
       "      <th>Designation</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>TotalVisiting</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>41.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20993.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>49.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Male</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20130.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>37.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Free Lancer</td>\n",
       "      <td>Male</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17090.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>33.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17909.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>36.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Small Business</td>\n",
       "      <td>Male</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Executive</td>\n",
       "      <td>18468.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Age    TypeofContact  CityTier  DurationOfPitch      Occupation  Gender  \\\n",
       "0  41.0     Self Enquiry         3              6.0        Salaried  Female   \n",
       "1  49.0  Company Invited         1             14.0        Salaried    Male   \n",
       "2  37.0     Self Enquiry         1              8.0     Free Lancer    Male   \n",
       "3  33.0  Company Invited         1              9.0        Salaried  Female   \n",
       "4  36.0     Self Enquiry         1              8.0  Small Business    Male   \n",
       "\n",
       "   NumberOfFollowups ProductPitched  PreferredPropertyStar MaritalStatus  \\\n",
       "0                3.0         Deluxe                    3.0     Unmarried   \n",
       "1                4.0         Deluxe                    4.0      Divorced   \n",
       "2                4.0          Basic                    3.0     Unmarried   \n",
       "3                3.0          Basic                    3.0      Divorced   \n",
       "4                3.0          Basic                    4.0      Divorced   \n",
       "\n",
       "   NumberOfTrips  Passport  PitchSatisfactionScore  OwnCar Designation  \\\n",
       "0            1.0         1                       2       1     Manager   \n",
       "1            2.0         0                       3       1     Manager   \n",
       "2            7.0         1                       3       0   Executive   \n",
       "3            2.0         1                       5       1   Executive   \n",
       "4            1.0         0                       5       1   Executive   \n",
       "\n",
       "   MonthlyIncome  TotalVisiting  \n",
       "0        20993.0            3.0  \n",
       "1        20130.0            5.0  \n",
       "2        17090.0            3.0  \n",
       "3        17909.0            3.0  \n",
       "4        18468.0            2.0  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    3968\n",
       "1     920\n",
       "Name: ProdTaken, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>TypeofContact</th>\n",
       "      <th>CityTier</th>\n",
       "      <th>DurationOfPitch</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Gender</th>\n",
       "      <th>NumberOfFollowups</th>\n",
       "      <th>ProductPitched</th>\n",
       "      <th>PreferredPropertyStar</th>\n",
       "      <th>MaritalStatus</th>\n",
       "      <th>NumberOfTrips</th>\n",
       "      <th>Passport</th>\n",
       "      <th>PitchSatisfactionScore</th>\n",
       "      <th>OwnCar</th>\n",
       "      <th>Designation</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>TotalVisiting</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>41.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20993.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>49.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Male</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Deluxe</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>Manager</td>\n",
       "      <td>20130.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>37.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Free Lancer</td>\n",
       "      <td>Male</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17090.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>33.0</td>\n",
       "      <td>Company Invited</td>\n",
       "      <td>1</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Salaried</td>\n",
       "      <td>Female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Executive</td>\n",
       "      <td>17909.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>36.0</td>\n",
       "      <td>Self Enquiry</td>\n",
       "      <td>1</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Small Business</td>\n",
       "      <td>Male</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Basic</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Executive</td>\n",
       "      <td>18468.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Age    TypeofContact  CityTier  DurationOfPitch      Occupation  Gender  \\\n",
       "0  41.0     Self Enquiry         3              6.0        Salaried  Female   \n",
       "1  49.0  Company Invited         1             14.0        Salaried    Male   \n",
       "2  37.0     Self Enquiry         1              8.0     Free Lancer    Male   \n",
       "3  33.0  Company Invited         1              9.0        Salaried  Female   \n",
       "4  36.0     Self Enquiry         1              8.0  Small Business    Male   \n",
       "\n",
       "   NumberOfFollowups ProductPitched  PreferredPropertyStar MaritalStatus  \\\n",
       "0                3.0         Deluxe                    3.0     Unmarried   \n",
       "1                4.0         Deluxe                    4.0      Divorced   \n",
       "2                4.0          Basic                    3.0     Unmarried   \n",
       "3                3.0          Basic                    3.0      Divorced   \n",
       "4                3.0          Basic                    4.0      Divorced   \n",
       "\n",
       "   NumberOfTrips  Passport  PitchSatisfactionScore  OwnCar Designation  \\\n",
       "0            1.0         1                       2       1     Manager   \n",
       "1            2.0         0                       3       1     Manager   \n",
       "2            7.0         1                       3       0   Executive   \n",
       "3            2.0         1                       5       1   Executive   \n",
       "4            1.0         0                       5       1   Executive   \n",
       "\n",
       "   MonthlyIncome  TotalVisiting  \n",
       "0        20993.0            3.0  \n",
       "1        20130.0            5.0  \n",
       "2        17090.0            3.0  \n",
       "3        17909.0            3.0  \n",
       "4        18468.0            2.0  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((3910, 17), (978, 17))"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# separate dataset into train and test\n",
    "X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)\n",
    "X_train.shape, X_test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 4888 entries, 0 to 4887\n",
      "Data columns (total 17 columns):\n",
      " #   Column                  Non-Null Count  Dtype  \n",
      "---  ------                  --------------  -----  \n",
      " 0   Age                     4888 non-null   float64\n",
      " 1   TypeofContact           4888 non-null   object \n",
      " 2   CityTier                4888 non-null   int64  \n",
      " 3   DurationOfPitch         4888 non-null   float64\n",
      " 4   Occupation              4888 non-null   object \n",
      " 5   Gender                  4888 non-null   object \n",
      " 6   NumberOfFollowups       4888 non-null   float64\n",
      " 7   ProductPitched          4888 non-null   object \n",
      " 8   PreferredPropertyStar   4888 non-null   float64\n",
      " 9   MaritalStatus           4888 non-null   object \n",
      " 10  NumberOfTrips           4888 non-null   float64\n",
      " 11  Passport                4888 non-null   int64  \n",
      " 12  PitchSatisfactionScore  4888 non-null   int64  \n",
      " 13  OwnCar                  4888 non-null   int64  \n",
      " 14  Designation             4888 non-null   object \n",
      " 15  MonthlyIncome           4888 non-null   float64\n",
      " 16  TotalVisiting           4888 non-null   float64\n",
      "dtypes: float64(7), int64(4), object(6)\n",
      "memory usage: 649.3+ KB\n"
     ]
    }
   ],
   "source": [
    "X.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create Column Transformer with 3 types of transformers\n",
    "cat_features = X.select_dtypes(include=\"object\").columns\n",
    "num_features = X.select_dtypes(exclude=\"object\").columns\n",
    "\n",
    "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n",
    "from sklearn.compose import ColumnTransformer\n",
    "\n",
    "numeric_transformer = StandardScaler()\n",
    "oh_transformer = OneHotEncoder(drop='first')\n",
    "\n",
    "preprocessor = ColumnTransformer(\n",
    "    [\n",
    "         (\"OneHotEncoder\", oh_transformer, cat_features),\n",
    "          (\"StandardScaler\", numeric_transformer, num_features)\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ColumnTransformer(transformers=[('OneHotEncoder', OneHotEncoder(drop='first'),\n",
       "                                 Index(['TypeofContact', 'Occupation', 'Gender', 'ProductPitched',\n",
       "       'MaritalStatus', 'Designation'],\n",
       "      dtype='object')),\n",
       "                                ('StandardScaler', StandardScaler(),\n",
       "                                 Index(['Age', 'CityTier', 'DurationOfPitch', 'NumberOfFollowups',\n",
       "       'PreferredPropertyStar', 'NumberOfTrips', 'Passport',\n",
       "       'PitchSatisfactionScore', 'OwnCar', 'MonthlyIncome', 'TotalVisiting'],\n",
       "      dtype='object'))])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "preprocessor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "## applying Trnsformation in training(fit_transform)\n",
    "X_train=preprocessor.fit_transform(X_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>16</th>\n",
       "      <th>17</th>\n",
       "      <th>18</th>\n",
       "      <th>19</th>\n",
       "      <th>20</th>\n",
       "      <th>21</th>\n",
       "      <th>22</th>\n",
       "      <th>23</th>\n",
       "      <th>24</th>\n",
       "      <th>25</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-1.020350</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.127737</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>0.679690</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.382245</td>\n",
       "      <td>-0.774151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>0.690023</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>1.511598</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>0.679690</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.459799</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-1.020350</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>1.771041</td>\n",
       "      <td>0.418708</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>0.679690</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.245196</td>\n",
       "      <td>-0.065268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-1.020350</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.127737</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>1.408395</td>\n",
       "      <td>-1.277194</td>\n",
       "      <td>0.213475</td>\n",
       "      <td>-0.065268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>2.400396</td>\n",
       "      <td>-1.720227</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>1.511598</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-0.049015</td>\n",
       "      <td>-1.277194</td>\n",
       "      <td>-0.024889</td>\n",
       "      <td>2.061382</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3905</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-0.653841</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.674182</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-1.506426</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.536973</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3906</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.455047</td>\n",
       "      <td>-0.898180</td>\n",
       "      <td>-0.718725</td>\n",
       "      <td>1.771041</td>\n",
       "      <td>-1.220627</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>1.408395</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>1.529609</td>\n",
       "      <td>-0.065268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3907</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.455047</td>\n",
       "      <td>1.545210</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>2.058043</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-0.777720</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.360576</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3908</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.455047</td>\n",
       "      <td>1.789549</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.127737</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-1.506426</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.252799</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3909</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-0.776011</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-1.220627</td>\n",
       "      <td>1.581280</td>\n",
       "      <td>-0.049015</td>\n",
       "      <td>-1.277194</td>\n",
       "      <td>-1.082511</td>\n",
       "      <td>-1.483035</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3910 rows × 26 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       0    1    2    3    4    5    6    7    8    9   ...        16  \\\n",
       "0     1.0  0.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "1     1.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  ... -0.721400   \n",
       "2     1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "3     1.0  0.0  1.0  0.0  1.0  1.0  0.0  0.0  0.0  1.0  ... -0.721400   \n",
       "4     0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "...   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...       ...   \n",
       "3905  1.0  0.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "3906  1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  ...  1.455047   \n",
       "3907  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  1.455047   \n",
       "3908  1.0  0.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  1.0  ...  1.455047   \n",
       "3909  0.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "\n",
       "            17        18        19        20        21        22        23  \\\n",
       "0    -1.020350  1.284279 -0.725271 -0.127737 -0.632399  0.679690  0.782966   \n",
       "1     0.690023  0.282777 -0.725271  1.511598 -0.632399  0.679690  0.782966   \n",
       "2    -1.020350  0.282777  1.771041  0.418708 -0.632399  0.679690  0.782966   \n",
       "3    -1.020350  1.284279 -0.725271 -0.127737 -0.632399  1.408395 -1.277194   \n",
       "4     2.400396 -1.720227 -0.725271  1.511598 -0.632399 -0.049015 -1.277194   \n",
       "...        ...       ...       ...       ...       ...       ...       ...   \n",
       "3905 -0.653841  1.284279 -0.725271 -0.674182 -0.632399 -1.506426  0.782966   \n",
       "3906 -0.898180 -0.718725  1.771041 -1.220627 -0.632399  1.408395  0.782966   \n",
       "3907  1.545210  0.282777 -0.725271  2.058043 -0.632399 -0.777720  0.782966   \n",
       "3908  1.789549  1.284279 -0.725271 -0.127737 -0.632399 -1.506426  0.782966   \n",
       "3909 -0.776011  0.282777 -0.725271 -1.220627  1.581280 -0.049015 -1.277194   \n",
       "\n",
       "            24        25  \n",
       "0    -0.382245 -0.774151  \n",
       "1    -0.459799  0.643615  \n",
       "2    -0.245196 -0.065268  \n",
       "3     0.213475 -0.065268  \n",
       "4    -0.024889  2.061382  \n",
       "...        ...       ...  \n",
       "3905 -0.536973  0.643615  \n",
       "3906  1.529609 -0.065268  \n",
       "3907 -0.360576  0.643615  \n",
       "3908 -0.252799  0.643615  \n",
       "3909 -1.082511 -1.483035  \n",
       "\n",
       "[3910 rows x 26 columns]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(X_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "## apply tansformation on test(transform)\n",
    "X_test=preprocessor.transform(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.        ,  0.        ,  0.        , ..., -1.2771941 ,\n",
       "        -0.73751038, -0.77415132],\n",
       "       [ 1.        ,  0.        ,  0.        , ..., -1.2771941 ,\n",
       "        -0.6704111 , -0.06526803],\n",
       "       [ 1.        ,  0.        ,  0.        , ...,  0.78296635,\n",
       "        -0.4208322 , -0.77415132],\n",
       "       ...,\n",
       "       [ 0.        ,  1.        ,  0.        , ...,  0.78296635,\n",
       "         0.69001249,  0.64361526],\n",
       "       [ 1.        ,  0.        ,  0.        , ...,  0.78296635,\n",
       "        -0.22827818, -0.77415132],\n",
       "       [ 1.        ,  1.        ,  0.        , ...,  0.78296635,\n",
       "        -0.44611323,  2.06138184]])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## XgboostBoost Classifier Training\n",
    "#### We can also combine multiple algorithms\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>16</th>\n",
       "      <th>17</th>\n",
       "      <th>18</th>\n",
       "      <th>19</th>\n",
       "      <th>20</th>\n",
       "      <th>21</th>\n",
       "      <th>22</th>\n",
       "      <th>23</th>\n",
       "      <th>24</th>\n",
       "      <th>25</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-1.020350</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.127737</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>0.679690</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.382245</td>\n",
       "      <td>-0.774151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>0.690023</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>1.511598</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>0.679690</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.459799</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-1.020350</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>1.771041</td>\n",
       "      <td>0.418708</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>0.679690</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.245196</td>\n",
       "      <td>-0.065268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-1.020350</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.127737</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>1.408395</td>\n",
       "      <td>-1.277194</td>\n",
       "      <td>0.213475</td>\n",
       "      <td>-0.065268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>2.400396</td>\n",
       "      <td>-1.720227</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>1.511598</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-0.049015</td>\n",
       "      <td>-1.277194</td>\n",
       "      <td>-0.024889</td>\n",
       "      <td>2.061382</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3905</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-0.653841</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.674182</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-1.506426</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.536973</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3906</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.455047</td>\n",
       "      <td>-0.898180</td>\n",
       "      <td>-0.718725</td>\n",
       "      <td>1.771041</td>\n",
       "      <td>-1.220627</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>1.408395</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>1.529609</td>\n",
       "      <td>-0.065268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3907</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.455047</td>\n",
       "      <td>1.545210</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>2.058043</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-0.777720</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.360576</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3908</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.455047</td>\n",
       "      <td>1.789549</td>\n",
       "      <td>1.284279</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-0.127737</td>\n",
       "      <td>-0.632399</td>\n",
       "      <td>-1.506426</td>\n",
       "      <td>0.782966</td>\n",
       "      <td>-0.252799</td>\n",
       "      <td>0.643615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3909</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.721400</td>\n",
       "      <td>-0.776011</td>\n",
       "      <td>0.282777</td>\n",
       "      <td>-0.725271</td>\n",
       "      <td>-1.220627</td>\n",
       "      <td>1.581280</td>\n",
       "      <td>-0.049015</td>\n",
       "      <td>-1.277194</td>\n",
       "      <td>-1.082511</td>\n",
       "      <td>-1.483035</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3910 rows × 26 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       0    1    2    3    4    5    6    7    8    9   ...        16  \\\n",
       "0     1.0  0.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "1     1.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  ... -0.721400   \n",
       "2     1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "3     1.0  0.0  1.0  0.0  1.0  1.0  0.0  0.0  0.0  1.0  ... -0.721400   \n",
       "4     0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "...   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...       ...   \n",
       "3905  1.0  0.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "3906  1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  ...  1.455047   \n",
       "3907  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  1.455047   \n",
       "3908  1.0  0.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  1.0  ...  1.455047   \n",
       "3909  0.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ... -0.721400   \n",
       "\n",
       "            17        18        19        20        21        22        23  \\\n",
       "0    -1.020350  1.284279 -0.725271 -0.127737 -0.632399  0.679690  0.782966   \n",
       "1     0.690023  0.282777 -0.725271  1.511598 -0.632399  0.679690  0.782966   \n",
       "2    -1.020350  0.282777  1.771041  0.418708 -0.632399  0.679690  0.782966   \n",
       "3    -1.020350  1.284279 -0.725271 -0.127737 -0.632399  1.408395 -1.277194   \n",
       "4     2.400396 -1.720227 -0.725271  1.511598 -0.632399 -0.049015 -1.277194   \n",
       "...        ...       ...       ...       ...       ...       ...       ...   \n",
       "3905 -0.653841  1.284279 -0.725271 -0.674182 -0.632399 -1.506426  0.782966   \n",
       "3906 -0.898180 -0.718725  1.771041 -1.220627 -0.632399  1.408395  0.782966   \n",
       "3907  1.545210  0.282777 -0.725271  2.058043 -0.632399 -0.777720  0.782966   \n",
       "3908  1.789549  1.284279 -0.725271 -0.127737 -0.632399 -1.506426  0.782966   \n",
       "3909 -0.776011  0.282777 -0.725271 -1.220627  1.581280 -0.049015 -1.277194   \n",
       "\n",
       "            24        25  \n",
       "0    -0.382245 -0.774151  \n",
       "1    -0.459799  0.643615  \n",
       "2    -0.245196 -0.065268  \n",
       "3     0.213475 -0.065268  \n",
       "4    -0.024889  2.061382  \n",
       "...        ...       ...  \n",
       "3905 -0.536973  0.643615  \n",
       "3906  1.529609 -0.065268  \n",
       "3907 -0.360576  0.643615  \n",
       "3908 -0.252799  0.643615  \n",
       "3909 -1.082511 -1.483035  \n",
       "\n",
       "[3910 rows x 26 columns]"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(X_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3995    0\n",
       "2610    0\n",
       "3083    0\n",
       "3973    0\n",
       "4044    0\n",
       "       ..\n",
       "4426    0\n",
       "466     0\n",
       "3092    0\n",
       "3772    0\n",
       "860     1\n",
       "Name: ProdTaken, Length: 3910, dtype: int64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: xgboost in c:\\users\\win10\\anaconda3\\lib\\site-packages (1.6.2)\n",
      "Requirement already satisfied: numpy in c:\\users\\win10\\anaconda3\\lib\\site-packages (from xgboost) (1.19.2)\n",
      "Requirement already satisfied: scipy in c:\\users\\win10\\anaconda3\\lib\\site-packages (from xgboost) (1.5.2)\n"
     ]
    }
   ],
   "source": [
    "!pip install xgboost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "from xgboost import XGBClassifier\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import accuracy_score, classification_report,ConfusionMatrixDisplay, \\\n",
    "                            precision_score, recall_score, f1_score, roc_auc_score,roc_curve "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Logisitic Regression\n",
      "Model performance for Training set\n",
      "- Accuracy: 0.8458\n",
      "- F1 score: 0.8200\n",
      "- Precision: 0.6994\n",
      "- Recall: 0.3032\n",
      "- Roc Auc Score: 0.6366\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.8354\n",
      "- F1 score: 0.8078\n",
      "- Precision: 0.6829\n",
      "- Recall: 0.2932\n",
      "- Roc Auc Score: 0.6301\n",
      "===================================\n",
      "\n",
      "\n",
      "Decision Tree\n",
      "Model performance for Training set\n",
      "- Accuracy: 1.0000\n",
      "- F1 score: 1.0000\n",
      "- Precision: 1.0000\n",
      "- Recall: 1.0000\n",
      "- Roc Auc Score: 1.0000\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.9192\n",
      "- F1 score: 0.9185\n",
      "- Precision: 0.8077\n",
      "- Recall: 0.7696\n",
      "- Roc Auc Score: 0.8626\n",
      "===================================\n",
      "\n",
      "\n",
      "Random Forest\n",
      "Model performance for Training set\n",
      "- Accuracy: 1.0000\n",
      "- F1 score: 1.0000\n",
      "- Precision: 1.0000\n",
      "- Recall: 1.0000\n",
      "- Roc Auc Score: 1.0000\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.9274\n",
      "- F1 score: 0.9221\n",
      "- Precision: 0.9545\n",
      "- Recall: 0.6597\n",
      "- Roc Auc Score: 0.8260\n",
      "===================================\n",
      "\n",
      "\n",
      "Gradient Boost\n",
      "Model performance for Training set\n",
      "- Accuracy: 0.8939\n",
      "- F1 score: 0.8819\n",
      "- Precision: 0.8756\n",
      "- Recall: 0.5021\n",
      "- Roc Auc Score: 0.7429\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.8589\n",
      "- F1 score: 0.8398\n",
      "- Precision: 0.7732\n",
      "- Recall: 0.3927\n",
      "- Roc Auc Score: 0.6824\n",
      "===================================\n",
      "\n",
      "\n",
      "Adaboost\n",
      "Model performance for Training set\n",
      "- Accuracy: 0.8565\n",
      "- F1 score: 0.8365\n",
      "- Precision: 0.7308\n",
      "- Recall: 0.3649\n",
      "- Roc Auc Score: 0.6670\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.8354\n",
      "- F1 score: 0.8115\n",
      "- Precision: 0.6630\n",
      "- Recall: 0.3194\n",
      "- Roc Auc Score: 0.6400\n",
      "===================================\n",
      "\n",
      "\n",
      "Xgboost\n",
      "Model performance for Training set\n",
      "- Accuracy: 0.9995\n",
      "- F1 score: 0.9995\n",
      "- Precision: 1.0000\n",
      "- Recall: 0.9973\n",
      "- Roc Auc Score: 0.9986\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.9356\n",
      "- F1 score: 0.9318\n",
      "- Precision: 0.9507\n",
      "- Recall: 0.7068\n",
      "- Roc Auc Score: 0.8490\n",
      "===================================\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "models={\n",
    "    \"Logisitic Regression\":LogisticRegression(),\n",
    "    \"Decision Tree\":DecisionTreeClassifier(),\n",
    "    \"Random Forest\":RandomForestClassifier(),\n",
    "    \"Gradient Boost\":GradientBoostingClassifier(),\n",
    "    \"Adaboost\":AdaBoostClassifier(),\n",
    "    \"Xgboost\":XGBClassifier()\n",
    "}\n",
    "for i in range(len(list(models))):\n",
    "    model = list(models.values())[i]\n",
    "    model.fit(X_train, y_train) # Train model\n",
    "\n",
    "    # Make predictions\n",
    "    y_train_pred = model.predict(X_train)\n",
    "    y_test_pred = model.predict(X_test)\n",
    "\n",
    "    # Training set performance\n",
    "    model_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "    model_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "    model_train_precision = precision_score(y_train, y_train_pred) # Calculate Precision\n",
    "    model_train_recall = recall_score(y_train, y_train_pred) # Calculate Recall\n",
    "    model_train_rocauc_score = roc_auc_score(y_train, y_train_pred)\n",
    "\n",
    "\n",
    "    # Test set performance\n",
    "    model_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "    model_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "    model_test_precision = precision_score(y_test, y_test_pred) # Calculate Precision\n",
    "    model_test_recall = recall_score(y_test, y_test_pred) # Calculate Recall\n",
    "    model_test_rocauc_score = roc_auc_score(y_test, y_test_pred) #Calculate Roc\n",
    "\n",
    "\n",
    "    print(list(models.keys())[i])\n",
    "    \n",
    "    print('Model performance for Training set')\n",
    "    print(\"- Accuracy: {:.4f}\".format(model_train_accuracy))\n",
    "    print('- F1 score: {:.4f}'.format(model_train_f1))\n",
    "    \n",
    "    print('- Precision: {:.4f}'.format(model_train_precision))\n",
    "    print('- Recall: {:.4f}'.format(model_train_recall))\n",
    "    print('- Roc Auc Score: {:.4f}'.format(model_train_rocauc_score))\n",
    "\n",
    "    \n",
    "    \n",
    "    print('----------------------------------')\n",
    "    \n",
    "    print('Model performance for Test set')\n",
    "    print('- Accuracy: {:.4f}'.format(model_test_accuracy))\n",
    "    print('- F1 score: {:.4f}'.format(model_test_f1))\n",
    "    print('- Precision: {:.4f}'.format(model_test_precision))\n",
    "    print('- Recall: {:.4f}'.format(model_test_recall))\n",
    "    print('- Roc Auc Score: {:.4f}'.format(model_test_rocauc_score))\n",
    "\n",
    "    \n",
    "    print('='*35)\n",
    "    print('\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Hyperparameter Training\n",
    "rf_params = {\"max_depth\": [5, 8, 15, None, 10],\n",
    "             \"max_features\": [5, 7, \"auto\", 8],\n",
    "             \"min_samples_split\": [2, 8, 15, 20],\n",
    "             \"n_estimators\": [100, 200, 500, 1000]}\n",
    "xgboost_params = {\"learning_rate\": [0.1, 0.01],\n",
    "                  \"max_depth\": [5, 8, 12, 20, 30],\n",
    "                  \"n_estimators\": [100, 200, 300],\n",
    "                  \"colsample_bytree\": [0.5, 0.8, 1, 0.3, 0.4]}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'max_depth': [5, 8, 15, None, 10],\n",
       " 'max_features': [5, 7, 'auto', 8],\n",
       " 'min_samples_split': [2, 8, 15, 20],\n",
       " 'n_estimators': [100, 200, 500, 1000]}"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rf_params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'learning_rate': [0.1, 0.01],\n",
       " 'max_depth': [5, 8, 12, 20, 30],\n",
       " 'n_estimators': [100, 200, 300],\n",
       " 'colsample_bytree': [0.5, 0.8, 1, 0.3, 0.4]}"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xgboost_params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Models list for Hyperparameter tuning\n",
    "randomcv_models = [\n",
    "                   (\"RF\", RandomForestClassifier(), rf_params),\n",
    "    (\"Xgboost\", XGBClassifier(), xgboost_params)\n",
    "                   \n",
    "                   ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('RF',\n",
       "  RandomForestClassifier(),\n",
       "  {'max_depth': [5, 8, 15, None, 10],\n",
       "   'max_features': [5, 7, 'auto', 8],\n",
       "   'min_samples_split': [2, 8, 15, 20],\n",
       "   'n_estimators': [100, 200, 500, 1000]}),\n",
       " ('Xgboost',\n",
       "  XGBClassifier(base_score=None, booster=None, callbacks=None,\n",
       "                colsample_bylevel=None, colsample_bynode=None,\n",
       "                colsample_bytree=None, early_stopping_rounds=None,\n",
       "                enable_categorical=False, eval_metric=None, gamma=None,\n",
       "                gpu_id=None, grow_policy=None, importance_type=None,\n",
       "                interaction_constraints=None, learning_rate=None, max_bin=None,\n",
       "                max_cat_to_onehot=None, max_delta_step=None, max_depth=None,\n",
       "                max_leaves=None, min_child_weight=None, missing=nan,\n",
       "                monotone_constraints=None, n_estimators=100, n_jobs=None,\n",
       "                num_parallel_tree=None, predictor=None, random_state=None,\n",
       "                reg_alpha=None, reg_lambda=None, ...),\n",
       "  {'learning_rate': [0.1, 0.01],\n",
       "   'max_depth': [5, 8, 12, 20, 30],\n",
       "   'n_estimators': [100, 200, 300],\n",
       "   'colsample_bytree': [0.5, 0.8, 1, 0.3, 0.4]})]"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "randomcv_models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 100 candidates, totalling 300 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done  98 tasks      | elapsed:   10.0s\n",
      "[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:   20.7s finished\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 100 candidates, totalling 300 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done  98 tasks      | elapsed:    3.7s\n",
      "[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:   11.6s finished\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------------- Best Params for RF -------------------\n",
      "{'n_estimators': 200, 'min_samples_split': 2, 'max_features': 8, 'max_depth': None}\n",
      "---------------- Best Params for Xgboost -------------------\n",
      "{'n_estimators': 200, 'max_depth': 12, 'learning_rate': 0.1, 'colsample_bytree': 1}\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import RandomizedSearchCV\n",
    "\n",
    "model_param = {}\n",
    "for name, model, params in randomcv_models:\n",
    "    random = RandomizedSearchCV(estimator=model,\n",
    "                                   param_distributions=params,\n",
    "                                   n_iter=100,\n",
    "                                   cv=3,\n",
    "                                   verbose=2,\n",
    "                                   n_jobs=-1)\n",
    "    random.fit(X_train, y_train)\n",
    "    model_param[name] = random.best_params_\n",
    "\n",
    "for model_name in model_param:\n",
    "    print(f\"---------------- Best Params for {model_name} -------------------\")\n",
    "    print(model_param[model_name])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Random Forest\n",
      "Model performance for Training set\n",
      "- Accuracy: 1.0000\n",
      "- F1 score: 1.0000\n",
      "- Precision: 1.0000\n",
      "- Recall: 1.0000\n",
      "- Roc Auc Score: 1.0000\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.9305\n",
      "- F1 score: 0.9255\n",
      "- Precision: 0.9624\n",
      "- Recall: 0.6702\n",
      "- Roc Auc Score: 0.8319\n",
      "===================================\n",
      "\n",
      "\n",
      "Xgboost\n",
      "Model performance for Training set\n",
      "- Accuracy: 1.0000\n",
      "- F1 score: 1.0000\n",
      "- Precision: 1.0000\n",
      "- Recall: 1.0000\n",
      "- Roc Auc Score: 1.0000\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.9509\n",
      "- F1 score: 0.9490\n",
      "- Precision: 0.9554\n",
      "- Recall: 0.7853\n",
      "- Roc Auc Score: 0.8882\n",
      "===================================\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "models={\n",
    "    \n",
    "    \"Random Forest\":RandomForestClassifier(n_estimators=1000,min_samples_split=2,\n",
    "                                          max_features=7,max_depth=None),\n",
    "    \"Xgboost\":XGBClassifier(n_estimators=200,max_depth=12,learning_rate=0.1,\n",
    "                           colsample_bytree=1)\n",
    "}\n",
    "for i in range(len(list(models))):\n",
    "    model = list(models.values())[i]\n",
    "    model.fit(X_train, y_train) # Train model\n",
    "\n",
    "    # Make predictions\n",
    "    y_train_pred = model.predict(X_train)\n",
    "    y_test_pred = model.predict(X_test)\n",
    "\n",
    "    # Training set performance\n",
    "    model_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "    model_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "    model_train_precision = precision_score(y_train, y_train_pred) # Calculate Precision\n",
    "    model_train_recall = recall_score(y_train, y_train_pred) # Calculate Recall\n",
    "    model_train_rocauc_score = roc_auc_score(y_train, y_train_pred)\n",
    "\n",
    "\n",
    "    # Test set performance\n",
    "    model_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "    model_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "    model_test_precision = precision_score(y_test, y_test_pred) # Calculate Precision\n",
    "    model_test_recall = recall_score(y_test, y_test_pred) # Calculate Recall\n",
    "    model_test_rocauc_score = roc_auc_score(y_test, y_test_pred) #Calculate Roc\n",
    "\n",
    "\n",
    "    print(list(models.keys())[i])\n",
    "    \n",
    "    print('Model performance for Training set')\n",
    "    print(\"- Accuracy: {:.4f}\".format(model_train_accuracy))\n",
    "    print('- F1 score: {:.4f}'.format(model_train_f1))\n",
    "    \n",
    "    print('- Precision: {:.4f}'.format(model_train_precision))\n",
    "    print('- Recall: {:.4f}'.format(model_train_recall))\n",
    "    print('- Roc Auc Score: {:.4f}'.format(model_train_rocauc_score))\n",
    "\n",
    "    \n",
    "    \n",
    "    print('----------------------------------')\n",
    "    \n",
    "    print('Model performance for Test set')\n",
    "    print('- Accuracy: {:.4f}'.format(model_test_accuracy))\n",
    "    print('- F1 score: {:.4f}'.format(model_test_f1))\n",
    "    print('- Precision: {:.4f}'.format(model_test_precision))\n",
    "    print('- Recall: {:.4f}'.format(model_test_recall))\n",
    "    print('- Roc Auc Score: {:.4f}'.format(model_test_rocauc_score))\n",
    "\n",
    "    \n",
    "    print('='*35)\n",
    "    print('\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEWCAYAAAB42tAoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAA+2UlEQVR4nO3deZxN9f/A8debiOxbJftujDWiRJEWKmkRSkWbFInSolX0Vdq+lJCvvmmzFKWSfqWiRPaQXckypW/2LWqY9++Pz5lxjTt3jjH3nlnez8fjPubes77PmZnzvufz+ZzPR1QVY4wxJi15gg7AGGNM1maJwhhjTESWKIwxxkRkicIYY0xEliiMMcZEZInCGGNMRJYozAkRkZUi0iroOLIKEXlURMYGtO9xIvJMEPvObCLSVUS+zOC69jcZZZYosjER2SgiB0Vkv4j84V04Ckdzn6oar6qzormPZCJyqog8KyKbveNcLyIPiojEYv9h4mklIgmh01R1iKreEaX9iYj0EZEVInJARBJE5AMRqReN/WWUiAwUkXdPZhuq+p6qXupjX8clx1j+TeZWliiyv/aqWhhoCDQCBgQbzokTkVPSmPUB0Aa4HCgC3Az0AIZHIQYRkaz2/zAcuA/oA5QEagJTgSsye0cRfgdRF+S+jU+qaq9s+gI2AheHfH4e+Czk87nAXGA3sAxoFTKvJPAm8DuwC5gaMu9KYKm33lygfup9AmcBB4GSIfMaAduBfN7n24DV3va/ACqFLKtAL2A98GuYY2sDHAIqpJreDDgCVPc+zwKeBRYAe4CPU8UU6RzMAv4FzPGOpTpwqxfzPmADcJe3bCFvmSRgv/c6CxgIvOstU9k7rm7AZu9cPBayv4LAW975WA08BCSk8but4R1n0wi//3HAa8BnXrzzgWoh84cDW4C9wGKgZci8gcBk4F1v/h1AU+AH71xtBUYA+UPWiQdmADuB/wGPAm2Bf4BE75ws85YtBrzhbec34Bkgrzevu3fO/+1t6xlv2vfefPHm/en9TpcDdXFfEhK9/e0HPk39fwDk9eL6xTsni0n1N2SvDFxrgg7AXifxyzv2H6Q88BMw3PtcDtiB+zaeB7jE+1zGm/8ZMAkoAeQDLvSmn+39gzbz/um6efs5Ncw+vwHuDInnBWC09/5q4GcgDjgFeByYG7KsehedkkDBMMf2HPBtGse9iaMX8Fnehagu7mI+haMX7vTOwSzcBT3eizEf7tt6Ne9idSHwF3C2t3wrUl3YCZ8o/oNLCg2Av4G40GPyznl53AUwrUTRE9iUzu9/HO5C29SL/z1gYsj8m4BS3rwHgD+AAiFxJ3q/pzxevI1xifUU71hWA3295YvgLvoPAAW8z81Sn4OQfU8FXvd+J6fjEnny76w7cBi419tXQY5NFJfhLvDFvd9DHFA25JififB/8CDu/6CWt24DoFTQ/6vZ/RV4APY6iV+e+wfZj/vmpMDXQHFv3sPAO6mW/wJ34S+L+2ZcIsw2RwGDU01by9FEEvpPeQfwjfdecN9eL/A+fw7cHrKNPLiLbiXvswIXRTi2saEXvVTz5uF9U8dd7J8LmVcH940zb6RzELLuoHTO8VTgPu99K/wlivIh8xcAXbz3G4DLQubdkXp7IfMeA+alE9s4YGzI58uBNRGW3wU0CIn7u3S23xf4yHt/A/BjGsulnAPv8xm4BFkwZNoNwEzvfXdgc6ptdOdoorgIWIdLWnnCHHOkRLEW6HCy/1v2OvaV1cpkzYm7WlWL4C5itYHS3vRKwPUisjv5BbTAJYkKwE5V3RVme5WAB1KtVwFXzJLaZOA8ETkLuAB3kZwdsp3hIdvYiUsm5ULW3xLhuLZ7sYZT1psfbjubcHcGpYl8DsLGICLtRGSeiOz0lr+co+fUrz9C3v8FJDcwOCvV/iId/w7SPn4/+0JEHhCR1SKyxzuWYhx7LKmPvaaITPMaRuwFhoQsXwFXnONHJdzvYGvIeX8dd2cRdt+hVPUbXLHXa8D/RGSMiBT1ue8TidP4ZIkih1DVb3Hftl70Jm3BfZsuHvIqpKrPefNKikjxMJvaAvwr1XqnqeqEMPvcDXwJdAJuBCao97XO285dqbZTUFXnhm4iwiF9BTQTkQqhE0WkKe5i8E3I5NBlKuKKVLancw6Oi0FETsUVXb0InKGqxYHpuASXXrx+bMUVOYWLO7WvgfIi0iQjOxKRlrg7qk64O8fiuPL+0BZjqY9nFLAGqKGqRXFl/cnLb8EVyYWTejtbcHcUpUPOe1FVjY+wzrEbVH1FVRvjigVr4oqU0l0vnThNBlmiyFmGAZeISENcJWV7EblMRPKKSAGveWd5Vd2KKxoaKSIlRCSfiFzgbeM/QE8Raea1BCokIleISJE09jkeuAW4znufbDQwQETiAUSkmIhc7/dAVPUr3MVyiojEe8dwLq4cfpSqrg9Z/CYRqSMipwGDgMmqeiTSOUhjt/mBU4FtwGERaQeENtn8H1BKRIr5PY5U3sedkxIiUg7ondaC3vGNBCZ4Mef34u8iIo/42FcRXD3ANuAUEXkSSO9beRFcxfZ+EakN3B0ybxpwpoj09ZotFxGRZt68/wGVk1uNeX9fXwIviUhREckjItVE5EIfcSMi53h/f/mAA7hGDUdC9lU1wupjgcEiUsP7+60vIqX87NekzRJFDqKq24C3gSdUdQvQAfetcBvum9aDHP2d34z75r0GV3nd19vGIuBO3K3/LlyFdPcIu/0E10Lnf6q6LCSWj4ChwESvGGMF0O4ED+k6YCbwf7i6mHdxLWnuTbXcO7i7qT9wFa19vBjSOwfHUNV93rrv4479Ru/4kuevASYAG7wilXDFcZEMAhKAX3F3TJNx37zT0oejRTC7cUUq1wCf+tjXF7gvA+twxXGHiFzUBdAfd8z7cF8YJiXP8M7NJUB73HleD7T2Zn/g/dwhIku897fgEu8q3LmcjL+iNHAJ7T/eeptwxXDJd8pvAHW88z81zLov435/X+KS3hu4ynJzEuRoSYEx2Y+IzMJVpAbydPTJEJG7cRXdvr5pGxMUu6MwJkZEpKyInO8VxdTCNTX9KOi4jEmPPRFpTOzkx7X+qYIrSpqIq4cwJkuzoidjjDERWdGTMcaYiLJd0VPp0qW1cuXKQYdhjDHZyuLFi7erapmMrJvtEkXlypVZtGhR0GEYY0y2IiKbMrquFT0ZY4yJyBKFMcaYiCxRGGOMicgShTHGmIgsURhjjInIEoUxxpiIopYoROS/IvKniKxIY76IyCsi8rOILBeRs6MVizHGmIyL5h3FONzA62lph+ueugZu0PRRUYzFGGNMBkXtgTtV/U5EKkdYpAPwtjci2jwRKS4iZb1BT4zJscbP38zHS38LOgyTS8St+5Ervp54UtsI8snschw7kEqCN+24RCEiPXB3HVSsWDEmwRmT2ZITxPxfdwLQrErJgCMyOVmRfbu4acprtJo3nT9L+R0zKrwgE4WEmRa2K1tVHQOMAWjSpIl1d2uyHD93CaEJokPDctzYzL70mCi67jpY9CUMGMDpjz8OhQpleFNBJooEjh1cvjzwe0CxGANkvFjIz12CJQgTdStXQvHiUK4cDB0KgwZBfPxJbzbIRPEJ0FtEJgLNgD1WP5E15aYy9YwWC1kSMIE6cAAGD4aXXoKuXWHcOKhePdM2H7VEISITgFZAaRFJAJ4C8gGo6mhgOnA58DPwF3BrtGIxxzuRi39uKlO3C77Jdj77DHr1gk2b4Lbb3J1EJotmq6cb0pmvQK9o7d9ETgYncvG3i6cxWdTIkS5J1KkD330HLVtGZTfZbjwK49/HS39j1da91Clb9Lh5dvE3Jps6fBi2bYOyZaFTJzh4EO69F/Lnj9ouLVHkEOHuHpKTxKS7zgsoKmNMplqwAO66C045BebNg9Kl4YEHor5bSxRZUEYqj8MVJdUpW5QODctlamzGmADs3g2PPgqjR7s7ieHDIU/suuqzRJEFRSoySosVJRmTQ/30E1xyiStu6tPHNXkt6v/akBksUQTMioyMMWElJkK+fFCzJrRuDQ8+CGcH03eqJYpUYv3MgBUZGWOO8fffronru+/CkiVQuDBMmBBoSLk+UaRODLF+ZsCKjIwxKb75Bu6+G9atg86dXdIoXDjoqCxRJHfSlpwY7MJtjIm5gwehRw93F1G1Kvzf/8FllwUdVYpcmShC7yKSk4TVBxhjAlOgAGzfDo8/7lo3FSwYdETHyJVDoabu6tnqA4wxMbd8ubtrSEgAEdcVx+DBWS5JQC67owgdD8DuIowxgThwAAYOhH//G0qUgPXroXz5mD4XcaJyfKJIXcwEdhdhjAnIJ5+47jY2b4Y774TnnoOSWb+zzRyZKNJKDlZRbYwJ1NSp7mG577+H888POhrfcmSiCH2y2ZKDMSYwiYnwyivugbmzz3ZdbxQo4B6ky0ZyXKIYP3+z1UEYY4I3b57rwG/5cnj4YZcoihQJOqoMybq1Jxkwfv5mHv3oJwCrgzDGBGPXLujZE5o3h5074aOP4Nlng47qpOSoRJFcLzHkmnpW1GSMCcaYMTB2LPTrB6tWwdVXu+av2ViOK3pqVqWkJQljTGytXet6d23RAvr2hXbtoH79oKPKNDnmjiK5bsIYY2Lm0CF46imXFHr1AlU49dQclSQgByWK5GInq5swxsTEjBlQr54bH6JjR/jii2xfxJSWiEVPIlIAuBJoCZwFHARWAJ+p6sroh+dPaEsnK3YyxkTdd9/BpZdCjRouYVx8cdARRVWadxQiMhCYA5wHzAdeB94HDgPPicgMEckS91d2N2GMibojR9xocwAtW8Ibb7imrzk8SUDkO4qFqjowjXkvi8jpQJb5+m53E8aYqPnxR9fkdfVq1zfTGWfAbbcFHVXMpHlHoaqfhX4WkUKp5v+pqouiFZgxxgRu3z64/35o0gQ2boRRo+D004OOKubSrcwWkeYisgpY7X1uICIjox6ZT9bayRgTFXv2QHy86+X1zjthzRro2jXHVlhH4uc5in8DlwGfAKjqMhG5IKpRnQCrnzDGZKq9e13HfcWKuVHn2rSB83J3d0C+mseq6pZUk45EIZYTZq2djDGZJjERnn/ejQ2xZImb9vjjuT5JgL87ii0i0hxQEckP9MErhgqa3U0YYzLFnDmusnrFCtflRpkyQUeUpfi5o+gJ9ALKAQlAQ+CeKMZ0QuxuwhhzUu6913W9sWcPfPyx68SvQoWgo8pS/NxR1FLVrqETROR83DMWxhiT/agerZQ+80zo3991xVG4cLBxZVF+7ihe9TnNGGOyvjVr3EBCH3/sPj/2GLzwgiWJCNK8oxCR84DmQBkRuT9kVlEgb7QDM8aYTHXwIAwZAkOHQqFC7rPxJdIdRX6gMC6ZFAl57QU6+tm4iLQVkbUi8rOIPBJmfjER+VRElonIShG59cQPwRhj0vH1164Dv2eegS5dXLfgXboEHVW2keYdhap+C3wrIuNUddOJblhE8gKvAZfgKsEXisgnqroqZLFewCpVbS8iZYC1IvKeqv5zovszxpg0JSTAKae4hHHRRUFHk+34qcz+S0ReAOKBAskTVTW9s90U+FlVNwCIyESgAxCaKBQoIiKCu3vZiet00BhjMu7IERg9GvLnd09V33KLu4M49dSgI8uW/FRmvwesAaoATwMbgYU+1isHhD6ol+BNCzUCiAN+B34C7lPVpNQbEpEeIrJIRBZt27bNx66NMbnWkiVw7rnQu7cbIwJcCydLEhnmJ1GUUtU3gERV/VZVbwPO9bFeuA5RNNXny4CluLEuGgIjRKTocSupjlHVJqrapIw9CGOMCWfvXrjvPjjnHNiyBSZMgA8+CDqqHMFPokj0fm4VkStEpBFQ3sd6CUDoUyvlcXcOoW4FPlTnZ+BXoLaPbVtngMaYYy1bBiNGuCes16xxRU25sAO/aPBTR/GMiBQDHsA9P1EU6OtjvYVADRGpAvwGdAFuTLXMZqANMFtEzgBqARvS2/D4+Zt59CM3gIh132FMLvbrrzBzphsbomVL+PlnqFIl6KhynHQThapO897uAVpDypPZ6a13WER6A1/gnrv4r6quFJGe3vzRwGBgnIj8hCuqelhVt6e37eQ+noZcU8+67zAmN/rnH3jpJTdedYECcM01UKKEJYkoifTAXV6gE64C+v9UdYWIXAk8ChQEGqW3cVWdDkxPNW10yPvfgUszErj18WRMLjV7titeWrUKrr0Whg93ScJETaQ7ijdwdQwLgFdEZBNu/OxHVHVqDGIzxphjbdsGl17qhiL99FO48sqgI8oVIiWKJkB9VU0SkQLAdqC6qv4Rm9CMMQbXgd9XX8Ell7juv6dNc81fCxVKf12TKSK1evon+ZkGVT0ErLMkYYyJqZUr4cIL3V3ErFluWps2liRiLFKiqC0iy73XTyGffxKR5bEKMLWdB/6xZrHG5HR//QWPPgoNG7pkMXYsXJBlRmDOdSIVPcXFLIoTsPuvRIpizWKNybFUXTfgCxZAt26uC3B70DZQkToFPOGOAGPFWjwZkwNt3Qqnnw5587q7iWLFoFWroKMy+Hsy2xhjoufIEXjlFahVC0aOdNM6dLAkkYVYojDGBGfRImja1PXR1Lw5XH550BGZMHwlChEpKCK1oh2MMSYXef55lyS2boVJk+Dzz6FataCjMmGkmyhEpD2uh9f/8z43FJFPohyXMSYnUoVEr5/Rpk2hVy9YvRo6dbIO/LIwP3cUA3GDEO0GUNWlQOVoBWSMyaF++QXatoVHvFGRW7WCV191ldYmS/OTKA6r6p6oR2KMyZn+/tuNVV23LvzwgxUvZUN+uhlfISI3AnlFpAbQB5gb3bCMMTnC4sVw001ufIjrr4dhw+Css4KOypwgP3cU9+LGy/4bGI/rbrxvFGMyxuQUhQu7uofp0+H99y1JZFN+7ihqqepjwGPRDsYYk80lJcGbb7oiprFj3bMRK1ZAHmuJn535+e29LCJrRGSwiMRHPSJjTPa0YoXrj+mOO2D9ejhwwE23JJHtpfsbVNXWQCtgGzDG6xTw8WgHZozJJg4cgIcfhkaNXF3Em2+6nl6th9ccw1eqV9U/VPUVoCfumYonoxmUMSYbOXTIJYdbboG1a6F7d3smIofx88BdnIgMFJEVwAhci6fyUY/MGJN1JSTAQw+5fppKlXJ3Em+84d6bHMfPHcWbwC7gUlW9UFVHqeqfUY7LGJMVHT4M//43xMXBiBGwdKmbXrJkoGGZ6Eq31ZOqnhuLQIwxWdz8+XDXXbBsmeu8b8QIqFIl6KhMDKSZKETkfVXt5I1up6GzAFXV+lGPzhiTNSQlwa23wp49MHkyXHut1UPkIpHuKO7zfl4Zi0CMMVmMqksKbdtCkSLw4YdQrpx7b3KVNOsoVHWr9/YeVd0U+gLuiU14xphArF8Pl13menUdM8ZNq13bkkQu5acy+5Iw09pldiDGmCzg779h0CCoV8/VSYwYAX37Bh2VCVikOoq7cXcOVUVkecisIsCcaAdmjAlAr16umWuXLvDyy1C2bNARmSwgUh3FeOBz4FngkZDp+1R1Z1SjMsbEzp9/usrqM890T1hff70rdjLGE6noSVV1I9AL2BfyQkSs0bQx2V1Skqt/qFXLjVkNUKOGJQlznPTuKK4EFuOax4a2hVOgahTjMsZE0/Ll0LOn6+W1VSt4+umgIzJZWJqJQlWv9H7aEzXG5CSTJ7s6iBIl4O233cBC9kyEicBPX0/ni0gh7/1NIvKyiFSMfmjGmEy1d6/72aqVq7ReuxZuvtmShEmXn+axo4C/RKQB8BCwCXgnqlEZYzLP5s3QoQO0aeM68StdGoYPt/6ZjG9+EsVhVVWgAzBcVYfjmsimS0TaishaEflZRB5JY5lWIrJURFaKyLf+QzfGRJSYCC++6Drw++or9/CcavrrGZOKn6FQ94nIAOBmoKWI5AXypbeSt9xruAf2EoCFIvKJqq4KWaY4MBJoq6qbReT0DByDMSa1TZvgqqtcpXX79vDqq1CpUtBRmWzKzx1FZ+Bv4DZV/QMoB7zgY72mwM+qukFV/wEm4u5KQt0IfKiqmwGs+3JjTlLyHcOZZ8IZZ8BHH8HHH1uSMCfFz1CofwDvAcVE5ErgkKq+7WPb5YAtIZ8TvGmhagIlRGSWiCwWkVt8xm2MCaUK774L55wD+/fDqafCl1/C1VdbZbU5aX5aPXUCFgDXA52A+SLS0ce2w/11pi4gPQVoDFwBXAY8ISI1w8TQQ0QWiciixMREH7s2JhdZu9ZVVN98M5xyCuzYEXREJofxU0fxGHBOcrGQiJQBvgImp7NeAlAh5HN54Pcwy2xX1QPAARH5DmgArAtdSFXHAGMASlaKs9o4Y8CNNjd4MDz3HBQsCKNGQY8ekMdPibIx/vn5i8qTqu5gh8/1FgI1RKSKiOQHugCfpFrmY1wF+SkichrQDFjtY9vGmLx5YfZs6NjR3VX07GlJwkSFnzuK/xORL4AJ3ufOwPT0VlLVwyLSG/gCyAv8V1VXikhPb/5oVV0tIv8HLAeSgLGquiIjB2JMrvDHH/Doo67LjQoVYPp0KFAg6KhMDudnzOwHReRaoAWu3mGMqn7kZ+OqOp1USUVVR6f6/AL+WlEZk3sdOeI68BswAA4ehHbtXKKwJGFiINJ4FDWAF4FqwE9Af1X9LVaBGWM8P/7oipUWLHCV1iNHQs3j2nwYEzWRCjT/C0wDrsP1IPtqTCIyxhxrxAjYuBHeew9mzLAkYWIuUtFTEVX9j/d+rYgsiUVAxuR6qjB1KlSuDI0auW44XnzR9fZqTAAi3VEUEJFGInK2iJwNFEz12RiT2TZudF1vXHstDBvmppUoYUnCBCrSHcVW4OWQz3+EfFbgomgFZUyuk5joxqh++mnXxPXFF4+OOmdMwCINXNQ6loEYk6u9/jo88ojrcmP4cKhoQ76YrMPPcxTGmGjYscMVNTVuDHfeCdWrQ9u2QUdlzHHsMU5jYk0V3noLateG6693XXGceqolCZNlWaIwJpZWr4bWraF7d6hRw7VuOsVu7E3Wlu5fqIgI0BWoqqqDvPGyz1TVBVGPzpicZNky1w144cLuKevbb7e+mUy24OevdCRwHnCD93kfbuQ6Y4wfCQnuZ/36rlXTmjWuTsKShMkm/PylNlPVXsAhAFXdBeSPalTG5AS//w6dO7sxq3/7zQ0gNGAAnG4j/prsxU+iSPTGv1ZIGY8iKapRGZOdHTniut2Ii3PDkD70EJQuHXRUxmSYn1q0V4CPgNNF5F9AR+DxqEZlTHZ16BBccAEsXAiXXOI68KtePeiojDkpfroZf09EFgNtcN2MX62qNriQMaESEyFfPtftd+vWcP/9rtjJxqs2OYCfMbMrAn8Bn+JGqDvgTTPGqMLkye6uYYnXb+bQodCliyUJk2P4KXr6DFc/IUABoAqwFoiPYlzGZH0bNkDv3vD5566XV2vFZHIoP0VP9UI/ez3H3hW1iIzJDl5+GR57zD0sN2wY9OplD86ZHOuE/7JVdYmInBONYIzJNvbvh8svdx34lS8fdDTGRJWfJ7PvD/mYBzgb2Ba1iIzJirZvhwcfhGuuceNFPP64FTWZXMPPHUWRkPeHcXUWU6ITjjFZTFISjBvnksTevVDPK4m1JGFykYiJwnvQrrCqPhijeIzJOlatgp49YfZsaNECRo+GeGvDYXKfiIlCVY/YsKcm11q0CFauhDfecL292l2EyaXSTBQicoqqHgaWisgnwAfAgeT5qvphDOIzJramT3cDCt18s3tdeSWULBl0VMYEKtJXpORuxEsCO3BjZLf3XldGOS5jYishATp2hCuucP00qboH5ixJGBOx6EkAVPXWGMViTOwdPgyvveZaMR0+DP/6F/Tvb09VGxMiUqIok6pp7DFU9eUoxGNMbC1eDH37umFIX3sNqlYNOiJjspxIiSIvUBjvzsKYHGPPHvj6a7j2WmjWDObPdyPP2V2EMWFFShRbVXVQzCIxJtpU4f333R3Ejh2wcSOcdRY0bRp0ZMZkaZEqs+3rlck5fvkF2rVzvbqWKwdz57okYYxJV6Q7iqvSW1lECqvq/kyMx5jMt28fNG7snrJ+5RW45x7ImzfoqIzJNiLdUYwTkZdE5AIRKZQ8UUSqisjtIvIF0Db6IRqTQcuXu59FiriH5lavhnvvtSRhzAlKM1Goahvga1yX4itFZI+I7ADeBc4Euqnq5NiEacwJ2LYNunWDBg3cA3QA113nipyMMScsvS48pgPTM7pxEWkLDMe1oBqrqs+lsdw5wDygsyUfk2FJSfDf/8JDD7luwB99FFq1CjoqY7I9P0OhThaRy0XkhDq68ToUfA1oB9QBbhCROmksNxT44kS2b8xxrrsO7rzT9fC6dKl7eO6004KOyphsz8/FfzTQFVgvIs+JSG2f224K/KyqG1T1H2Ai0CHMcvfiui3/0+d2jTnqwAH3RDXADTe4LsFnzYI6x30nMcZkULqJQlW/UtWuuAGLNgIzRGSuiNwqIvkirFoO2BLyOcGblkJEygHX4JJRmkSkh4gsEpFFiYmJ6YVscotPP3UJYeRI97lTJ1c3YQ/OGZOpfBUniUgpoDtwB/Ajrt7hbGBGpNXCTNNUn4cBD6vqkUj7V9UxqtpEVZvkyxcpN5lcYcsW91T1VVe5Fk2NGwcdkTE5mp+hUD8EagPvAO1Vdas3a5KILIqwagJQIeRzeeD3VMs0ASaK+wZYGrhcRA6r6lR/4Ztc59133WBCSUnw3HPQrx/kzx90VMbkaH6GQh3rtX5KISKnqurfqtokwnoLgRoiUgX4DegC3Bi6gKpWCdnmOGCaJQkTVnK33+XLu5ZMr74KVaqku5ox5uT5KXp6Jsy0H9JbyRv0qDeuNdNq4H1VXSkiPUWk54mFaXKt3bvh7rvdmNXgksS0aZYkjImhSCPcnYmrfC4oIo04WudQFPDV5jDccxiqGrbiWlW7+9mmySVUYcIEuP9+9wBdv35H7yqMMTEVqejpMlwFdnkgdOyJfcCjUYzJ5Ha//go9esBXX7nuvz//HBo1CjoqY3KtNBOFqr4FvCUi16nqlBjGZHK7xETXT9Nrr8Fdd1nfTMYELFLR002q+i5QOdxIdzbCnclUX38Nn30GL78MNWvCpk1QoEDQURljiFyZndxjbGGgSJiXMSfvf/+Dm26Ciy+GTz5xAwqBJQljspBIRU+ve29Hquq2GMVjcoukJPjPf+CRR1w3HE88AQMGQMGCQUdmjEnFz3MUc0XkV2AS8KGq7opyTCY32LMHHn8cGjaEUaOgtt8uxIwxseanr6cawONAPLBYRKaJyE1Rj8zkPPv3uzqII0egRAmYPx+++caShDFZnK++nlR1garej+sRdifwVlSjMjnPxx+7DvweeAC+/dZNq1rVnoswJhvwMx5FURHpJiKfA3OBrbiEYUz6Nm2CDh3g6quheHGYMwcuuijoqIwxJ8BPHcUyYCowSFXT7brDmBSq0LEjrFoFzz8PffuC9f5rTLbjJ1FUVdXU3YMbk7Z58yA+3nUBPmYMlCwJlSoFHZUxJoPSLHoSkWHe209E5LhXbMIz2crOne5J6vPOgxdfdNMaNbIkYUw2F+mO4h3v54uxCMRkY6punIgHHnDJ4oEHjvb2aozJ9iI9cLfYe9tQVYeHzhOR+4BvoxmYyUYefdQNInTuuTBjBjRoEHRExphM5Kd5bLcw07pnchwmuzl0CLZvd+9vvdU9NDdnjiUJY3KgSJ0C3oAbka5KqjqJIsCOaAdmsrAZM+Cee6BuXfjoI9eJX82aQUdljImSSHUUyc9MlAZeCpm+D1gezaBMFvXHH24goQkToEYN6N076IiMMTEQqY5iE7AJOC924Zgsa+ZMuOYaOHgQBg6Ehx+2Hl6NySUiFT19r6otRGQfEPochQCqqkWjHp0JXmKie0iufn245BL417+smMmYXCbSHUUL76eNPZEb7dsHTz4JP/zgKqlLlYIPPgg6KmNMAPz09VRNRE713rcSkT4iUjzqkZlgqMKHH0JcHAwf7h6Y+/vvoKMyxgTIT/PYKcAREakOvAFUAcZHNSoTjO3boX17uO46KF0a5s51zV5POy3oyIwxAfKTKJJU9TBwDTBMVfsBZaMblglEkSJuaNKXX4ZFi9wDdMaYXM9Pokj0nqnoBkzzplkXoDnF999Du3ZuUKFTT3WDCfXrB6f46S/SGJMb+EkUt+KayP5LVX8VkSrAu9ENy0Tdjh1wxx3QsqXrBnzDBjc9j6+xrIwxuUi6XxtVdRXQJ+Tzr8Bz0QzKRJEqvPUW9O8Pu3e7zvueegoKFQo6MmNMFpVuohCR84GBQCVv+eTnKKpGNzQTNW+/DbVqwejRUK9e0NEYY7I4PwXRbwD9gMXAkeiGY6Li4EHXu+udd0L58jBlChQrZsVMxhhf/CSKPar6edQjMdHxxReuA78NG+D006FXLyhRIuiojDHZiJ9EMVNEXgA+BFKevFLVJVGLypy83393rZfef98VM33zDbRuHXRUxphsyE+iaOb9bBIyTYGLMj8ck2meeQY+/hgGDYKHHnJNX40xJgP8tHqyr6HZxeLFRzvwGzzYdQlevXrQURljsjk/fT2dISJviMjn3uc6InK7n42LSFsRWSsiP4vII2HmdxWR5d5rrojY8GgZsXcv9OkDTZu6YUnBdeJnScIYkwn8NHsZB3wBnOV9Xgf0TW8lEckLvAa0A+oAN4hInVSL/QpcqKr1gcHAGF9RG0fV9ehauzaMGAF33w3v2rOQxpjM5SdRlFbV94EkAK/fJz/NZJsCP6vqBlX9B5gIdAhdQFXnquou7+M8oLzvyA2MHw+dOsGZZ7quN0aMgOLFg47KGJPD+KnMPiAipfAGLxKRc4E9PtYrB2wJ+ZzA0YrxcG4HwjbDFZEeQA+AwmWr+dh1DvbPP66pa+3a0LGje0aie3frm8kYEzV+ri73A58A1URkDlAG6OhjPQkzTcNMQ0Ra4xJFi3DzVXUMXrFUyUpxYbeRK3z3HfTs6TrwW7fODUV6xx1BR2WMyeH8tHpaIiIXArVwF/+1qproY9sJQIWQz+WB31MvJCL1gbFAO1Xd4Svq3Gb7dtcn07hxULmy63rDxqs2xsRIpDGzzwG2qOofqnpYRBoD1wGbRGSgqu5MZ9sLgRpeb7O/AV2AG1PtoyLuQb6bVXXdyRxIjrVhA5xzjmvZ9Mgj8MQTNpCQMSamIlVmvw78AyAiF+B6jH0bVz+Rbuskr9K7N67F1GrgfVVdKSI9RaSnt9iTQClgpIgsFZFFGT6SnGbvXvezShW49Vb48Ud49llLEsaYmItU9JQ35K6hMzBGVacAU0RkqZ+Nq+p0YHqqaaND3t8BWCF7qL/+cg/LjRkDy5a5TvxefDHoqIwxuVikO4q8IpKcSNoA34TMsyY20fDZZxAf73p67dABChYMOiJjjIl4wZ8AfCsi24GDwGwAEamOv+axUXHgn8NB7Tp6Dh+GG26AyZMhLg6+/RYuuCDoqIwxBoiQKFT1XyLyNVAW+FJVk5ul5gHujUVwaenQsFyQu888qiDinoE44wwYMgQeeADy5w86MmOMSSFHr//ZQ8lKcbpz0+qgwzh5Cxe6sSFGj4azzw46GmNMDicii1W1SfpLHs+GOIu1PXugd29o1gwSEmCHPTpijMnaLFHEUnIHfqNGuWSxZg1ccknQURljTETWeimWVq+GcuXg00+hSYbuAI0xJuasjiKa/v4bXngBGjSA9u0hMRHy5IG8eYOOzBiTy1gdRVY0c6ZLEE88AV9/7ably2dJwhiT7ViiyGx//gndusFFF7k7iM8/h2HDgo7KGGMyzBJFZvvyS5gwAR57DFasgLZtg47IGGNOilVmZ4affoK1a91AQl27QvPmULVq0FEZY0ymsDuKk3HgADz0EDRq5H4mJronrS1JGGNyELujyKhPP3XPQmzeDLffDkOHuspqk2MlJiaSkJDAoUOHgg7FmDQVKFCA8uXLky8Tr0eWKDJixQq46irX0+vs2dAi7AiuJodJSEigSJEiVK5cGZFwI/0aEyxVZceOHSQkJFClSpVM264VPfl1+DDMmuXe160L06a5wYQsSeQahw4dolSpUpYkTJYlIpQqVSrT73otUfgxf757krpNG1i/3k274gorasqFLEmYrC4af6OWKCLZtQvuvhvOOw+2b3d9NVWvHnRUxhgTU5Yo0vL3364105gx0Lev66fp2mtdqyZjArBlyxaqVKnCzp1uhOJdu3ZRpUoVNm3aFHG9ypUrs3379qjEtHTpUqZPnx523qxZsyhWrBiNGjWidu3a9O/f/5j5U6dOpX79+tSuXZt69eoxderUY+a/+OKL1K5dm7p169KgQQPefvvtsPvp27cv3333XaYcTzQsXryYevXqUb16dfr06UO4bpMSExPp1q0b9erVIy4ujmeffTZl3qRJk6hfvz7x8fE89NBDKdNHjBjBm2++GZNjQFWz1atExdoaVQkJR9+/+abqkiXR3Z/JNlatWhV0CDp06FC98847VVW1R48eOmTIkHTXqVSpkm7bti0q8bz55pvaq1evsPNmzpypV1xxhaqq/vXXX1qrVi39/vvvVVV16dKlWq1aNd2wYYOqqm7YsEGrVaumy5YtU1XVUaNG6aWXXqp79uxRVdXdu3fruHHjjtvHjh07tFmzZicUc2Ji4gktf7LOOeccnTt3riYlJWnbtm11+vTpxy3z3nvvaefOnVVV9cCBA1qpUiX99ddfdfv27VqhQgX9888/VVX1lltu0a+++ipluYYNG4bdZ7i/VWCRZvC6a62ekh065Jq4DhkC77/vxqzu3j3oqEwW9fSnK1n1+95M3Wads4ryVPv4iMv069ePxo0bM2zYML7//nteffVVAJKSkujduzfffvstVapUISkpidtuu42OHTsC8MILLzBz5kwAxo8fT/Xq1dm0aRO33XYb27Zto0yZMrz55ptUrFgxzekffPABTz/9NHnz5qVYsWJ89dVXPPnkkxw8eJDvv/+eAQMG0Llz57BxFyxYkIYNG/Lbb78B7m7h0UcfTWmZU6VKFQYMGMALL7zAO++8w5AhQ5g5cyZFixYFoFixYnTr1u247U6ePJm2Ib0fDBo0iE8//ZSDBw/SvHlzXn/9dUSEVq1a0bx5c+bMmcNVV11Fq1atuP/++9m/fz+lS5dm3LhxlC1blv/85z+MGTOGf/75h+rVq/POO+9w2mmnnciv8Rhbt25l7969nHfeeQDccsstTJ06lXbt2h2znIhw4MABDh8+zMGDB8mfPz9Fixbll19+oWbNmpQpUwaAiy++mClTptCmTRtOO+00KleuzIIFC2jatGmGY/TDip7AddpXvz4MHAjXXecGFTImC8qXLx8vvPAC/fr1Y9iwYeT3hs398MMP2bhxIz/99BNjx47lhx9+OGa9okWLsmDBAnr37k3fvn0B6N27N7fccgvLly+na9eu9OnTJ+L0QYMG8cUXX7Bs2TI++eQT8ufPz6BBg+jcuTNLly5NM0mAKyZbv349F3hjwa9cuZLGjRsfs0yTJk1YuXIl+/btY9++fVSrVi3d8zFnzpxjttO7d28WLlzIihUrOHjwINOmTUuZt3v3br799lv69OnDvffey+TJk1m8eDG33XYbjz32GADXXnstCxcuZNmyZcTFxfHGG28ct8+ZM2fSsGHD417Nmzc/btnffvuN8uXLp3wuX758SrIM1bFjRwoVKkTZsmWpWLEi/fv3p2TJklSvXp01a9awceNGDh8+zNSpU9myZcsx52z27NnpnqeTZXcUffvC8OGukvrLL20gIeNLet/8o+nzzz+nbNmyrFixgku8v9fvv/+e66+/njx58nDmmWfSunXrY9a54YYbUn7269cPgB9++IEPP/wQgJtvvjml/Dut6eeffz7du3enU6dOXHvttb5inT17NvXr12ft2rU88sgjnHnmmYAr8k7dOid5Wrh5adm6dWvKt21wF/Hnn3+ev/76i507dxIfH0/79u0BUhLZ2rVrjzl3R44coWzZsgCsWLGCxx9/nN27d7N//34uu+yy4/bZunVrli5d6is+DVMfEe7YFixYQN68efn999/ZtWsXLVu25OKLL6Zq1aqMGjWKzp07kydPHpo3b86GDRtS1jv99NNZs2aNr1hORu5MFElJoOq6/G7aFJ58EgYMgAIFgo7MmIiWLl3KjBkzmDdvHi1atKBLly6ULVs27AUpVOjFKa2LcHrTR48ezfz58/nss89o2LChr4tly5YtmTZtGuvWraNFixZcc801NGzYkPj4eBYtWkT9+vVTll2yZAl16tShaNGiFCpUiA0bNlA1ne5wChYsmPLMwKFDh7jnnntYtGgRFSpUYODAgcc8T1CoUCHAXbzj4+OPu+sC6N69O1OnTqVBgwaMGzeOWcnPToWYOXNmSrINddpppzF37txjppUvX56EhISUzwkJCZx11lnHrTt+/Hjatm1Lvnz5OP300zn//PNZtGgRVatWpX379inJbsyYMeQNGarg0KFDFCxYMNIpyhS5r+hp2TLXad9rr7nPN94ITz9tScJkearK3XffzbBhw6hYsSIPPvhgSkuiFi1aMGXKFJKSkvjf//533AVu0qRJKT+Ty8ubN2/OxIkTAXjvvfdo4T08mtb0X375hWbNmjFo0CBKly7Nli1bKFKkCPv27Us39po1azJgwACGDh0KQP/+/Xn22WfZuHEjABs3bmTIkCE88MADAAwYMIBevXqxd6+rB9q7dy9jxow5brtxcXH8/PPPAClJoXTp0uzfv5/JkyeHjaVWrVps27YtJVEkJiaycuVKAPbt20fZsmVJTEzkvffeC7t+8h1F6lfqJAFQtmxZihQpwrx581BV3n77bTp06HDcchUrVuSbb75BVTlw4ADz5s2jdu3aAPz555+AK74bOXIkd9xxR8p669ato27dumHjzFQZrQUP6pXhVk/79qnef79q3ryqZcqoTpqUse2YXCvoVk+vv/66durUKeXz4cOH9eyzz9ZZs2bpkSNH9K677tK4uDjt0KGDtm3bVr/88ktVda2eBg4cqE2bNtUmTZro+vXrVVX1119/1datW2u9evX0oosu0k2bNkWcfs0112jdunU1Pj5e+/Tpo0lJSbpjxw5t0qSJNmjQQCdOnHhMvKGtnlRdy6ezzjorpaXTlClTtG7dulqrVi2tW7euTpkyJWXZpKQkHTp0qNasWVPj4+O1YcOG+s477xx3Tr777jvt2rVryufHHntMq1Wrpm3atNHu3bvrU089paqqF154oS5cuDBluR9//FFbtmyp9evX1zp16uiYMWNUVXXkyJFauXJlvfDCC7V3797arVu3E/slhbFw4UKNj4/XqlWraq9evTQpKUlVVT/++GN94oknVFV137592rFjR61Tp47GxcXp888/n7J+ly5dNC4uTuPi4nTChAnHbLtRo0ZhW7RldqunwC/8J/rKUKKYMUO1fHl3uD16qO7ceeLbMLle0IkiPfv27VNV1e3bt2vVqlV169atAUcUG+eff77u2rUr6DBibsmSJXrTTTeFnWfNYzMif34oWRImTXLFTsbkQFdeeSW7d+/mn3/+4YknnkipOM7pXnrpJTZv3kzx4sWDDiWmtm/fzuDBg2Oyr5yZKBIT3fCje/bAM8/ABRe4Dvzy5L4qGZN7hKt4zQ2a5dLm7JfEsIVmzrtyzp0LjRu7gYRWr3YtnMCShMkUmk7rImOCFo2/0Zxz9dy5E3r0gPPPh927YepUmDLFEoTJNAUKFGDHjh2WLEyWperGoyiQya04c07R044dMH489O8PTz0FhQsHHZHJYZLbxG/bti3oUIxJU/IId5kpeyeKtWtdBfWTT0KNGrBpE5QqFXRUJofKly9fpo4aZkx2EdVyGRFpKyJrReRnEXkkzHwRkVe8+ctF5GxfGz540CWH+vXh3/+G5L5PLEkYY0ymi1qiEJG8wGtAO6AOcIOI1Em1WDughvfqAYxKb7sFD+2HevVg8GC4/npYswYqVMjk6I0xxiSL5h1FU+BnVd2gqv8AE4HUz653AN72ngeZBxQXkbKRNnr69q2ugvqrr+Ddd+GMM6ITvTHGGCC6dRTlgC0hnxOA1A2ewy1TDtgaupCI9MDdcQD8LevXr+DiizM32uypNBCdocuyHzsXR9m5OMrOxVG1MrpiNBNFuK4oU7cr9LMMqjoGGAMgIotUtcnJh5f92bk4ys7FUXYujrJzcZSILMroutEsekoAQisPygO/Z2AZY4wxAYpmolgI1BCRKiKSH+gCfJJqmU+AW7zWT+cCe1R1a+oNGWOMCU7Uip5U9bCI9Aa+APIC/1XVlSLS05s/GpgOXA78DPwF3Opj08d3Sp972bk4ys7FUXYujrJzcVSGz4VYdwTGGGMisY6QjDHGRGSJwhhjTERZNlFErfuPbMjHuejqnYPlIjJXRBoEEWcspHcuQpY7R0SOiEjHWMYXS37OhYi0EpGlIrJSRL6NdYyx4uN/pJiIfCoiy7xz4ac+NNsRkf+KyJ8isiKN+Rm7bmZ0aLxovnCV378AVYH8wDKgTqplLgc+xz2LcS4wP+i4AzwXzYES3vt2uflchCz3Da6xRMeg4w7w76I4sAqo6H0+Pei4AzwXjwJDvfdlgJ1A/qBjj8K5uAA4G1iRxvwMXTez6h1FVLr/yKbSPReqOldVd3kf5+GeR8mJ/PxdANwLTAH+jGVwMebnXNwIfKiqmwFUNaeeDz/nQoEiIiJAYVyiOBzbMKNPVb/DHVtaMnTdzKqJIq2uPU50mZzgRI/zdtw3hpwo3XMhIuWAa4DRMYwrCH7+LmoCJURklogsFpFbYhZdbPk5FyOAONwDvT8B96lqUmzCy1IydN3MquNRZFr3HzmA7+MUkda4RNEiqhEFx8+5GAY8rKpH3JfHHMvPuTgFaAy0AQoCP4jIPFVdF+3gYszPubgMWApcBFQDZojIbFXdG+XYspoMXTezaqKw7j+O8nWcIlIfGAu0U9UdMYot1vyciybARC9JlAYuF5HDqjo1JhHGjt//ke2qegA4ICLfAQ2AnJYo/JyLW4Hn1BXU/ywivwK1gQWxCTHLyNB1M6sWPVn3H0eley5EpCLwIXBzDvy2GCrdc6GqVVS1sqpWBiYD9+TAJAH+/kc+BlqKyCkichqu9+bVMY4zFvyci824OytE5AxcT6obYhpl1pCh62aWvKPQ6HX/ke34PBdPAqWAkd436cOaA3vM9HkucgU/50JVV4vI/wHLgSRgrKqGbTaZnfn8uxgMjBORn3DFLw+rao7rflxEJgCtgNIikgA8BeSDk7tuWhcexhhjIsqqRU/GGGOyCEsUxhhjIrJEYYwxJiJLFMYYYyKyRGGMMSYiSxTmGOn1Phmy3GNeL5zLvd5Jm2VyHNNFpLj3vo+IrBaR90Tkqki9xnrLz/V+VhaRG33u72oRedJ7P1BEfvOOa6mIPBdhvYEi0t/3gYXfRmUROejta5WIjBaRE/rfFJEmIvKK976ViDQPmdczM7rvSHVeVonIDT7W6es9w5HechNFpMbJxmiiw5rHmmOIyAXAflzHYXXTWOY84GWglar+LSKlcT1xRuXJeBFZg3vi/NcTXK8V0F9Vr/Sx7FzgKlXdLiIDgf2q+qKP9XwvG2EblYFpqlpXRE7B9Xw7TFU/zOD2Tjqm9LbrXdQXA6VUNTHCOhuBJuk9syAiFwI3qeqdmRiyySR2R2GO4aP3SYCyuK4h/vbW2Z6cJERko4gMFZEF3qu6N72MiEwRkYXe63xvemEReVNEfvLuTq4L2U5pERmN6z76ExHpJyLdRWSEt8wZIvKRuDEGliV/ixaR/V6cz+GeTF7qrTtbRBomH4SIzBGR+iJSE/g70sVMRO704l7mHcdx35K9O59V3nFM9KYV8u7SForIjyISrrfb0PN/GJgLVBeRSiLytbe9r8U9gY+IXC8iK7xYvvOmtRKRaV7S6Qn08467ZfJdj4jEiUhKlxXencxy731jEflWXOeBX0g6PYqq6nrcA1slvPVHicgicXeZTyefD+AsYKaIzPSmXSoiP4jIEhH5QEQKe5ucDVzsJUqT1cS6v3R7Zf0XUJk0+rP35hfGdbC2DhgJXBgybyPwmPf+Ftw3ZYDxQAvvfUVgtfd+KO7bc/L6JUK2UzrM++7ACO/9JKCv9z4vUMx7v9/72Sp5/97nbsn7wvWsush7fyvwUshyA4HfvGNciutQrlTI/GeAe0OW7e+9/x041Xtf3Ps5BPdNGdz4EOuAQmmdb+A0XJcU7YBPgW7e9NuAqd77n4ByqfaTcqyhMYWJcSlQ1Xv/MPA47snduUAZb3pn3NPNqX/vods5G5gdMq9kyO9hFlA/zO+uNPBd8vF7+38yZBszgMZB//3b6/iX3VGYE6aq+3G9kvYAtgGTRKR7yCITQn6e572/GBghIktx/c0UFZEi3vTXQra9C/8uAkZ56x1R1T3pLP8BcKWI5MNdeMd508t6xxHq36ra0Ht9AdT17kh+AroC8WG2vxx4T0Ru4uhYB5cCj3jHPQsogEuUqVXzlpkDfKaqn+PO3Xhv/jsc7RV4Dq47ijtxF+YT8T7QyXvfGZdsawF1cT2qLsUlj7TGNOknImuB+bjEkayTiCwBfsSdmzph1j3Xmz7H2083oFLI/D9xdyAmi7HbPJMuEamA+3YLMFpdP0JHcBe+Wd7FsxtHL7yhFV/J7/MA56nqwVTbFmLUPbyq/iUiM3CDt3TC9TQLcBAols7q44CrVXWZlxRbhVnmCtwIY1cBT4hIPK5foetUdW062/9FVRumdwjecfQU13jgCmBpaHGaD5OAD0TkQ7cpXS8i9YCVqnpeOuuCS6Avisi1wNsiUg2XaPsD56jqLhEZh0uIqQkwQ1XTqgQvgPtdmCzG7ihMulR1S8i369EiUkuObaHSENgU8rlzyM8fvPdfAr2TFwi5uKWeXuIEQvsauNtbL6+IFE01fx9QJNW0scArwEJVTa6LWQ1UT2dfRYCt3t1I19QzxbVSqqCqM4GHcMVMhXEd1d3rJUREpJG/QwNccVAX731X4HtvG9VUdb6qPgls59huoyH8cQOgqr8AR4AncEkDYC1QRlwjBUQkn5fk0qSuon0R7gtCUeAAsEdcz6zt0ohlHnC+HK23Os2rH0pWE1gZab8mGJYozDHE9T75A1BLRBJE5PYwixUG3kquuMUVJwwMmX+qiMwH7gP6edP6AE28itlVuApXcOX9JZIrZ4HWJxDufUBr745mMccXBy0HDnuVvv0AVHUxsBd4M2S574BGyRfzNDyBK26ZAawJMz8v8K4Xy4+4b967cb2W5gOWi2tyPPgEjq8PcKt3jm/2jhfgBXGV/yu82JelWu9T4Jrkyuww250E3IQrhkLd8KEdgaHe72Apbhz29AwC7sfVmfyIu8j/F1c0lmwM8LmIzFTVbbg6pgneMc3DjQmR3PX3Qc2ZQwVke9Y81mQq8dkcMigichauyKy2hgyFKSLDgU9V9augYsvNvES+V1XfCDoWczy7ozC5hriHzubjWmWlHi95CK7FkQnGbuCtoIMw4dkdhTHGmIjsjsIYY0xEliiMMcZEZInCGGNMRJYojDHGRGSJwhhjTET/D4ANLEvpBdYqAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "## Plot ROC AUC Curve\n",
    "from sklearn.metrics import roc_auc_score,roc_curve\n",
    "plt.figure()\n",
    "\n",
    "# Add the models to the list that you want to view on the ROC plot\n",
    "auc_models = [\n",
    "{\n",
    "    'label': 'Xgboost',\n",
    "    'model':XGBClassifier(n_estimators=200,max_depth=12,learning_rate=0.1,\n",
    "                           colsample_bytree=1),\n",
    "    'auc':  0.8882\n",
    "},\n",
    "    \n",
    "]\n",
    "# create loop through all model\n",
    "for algo in auc_models:\n",
    "    model = algo['model'] # select the model\n",
    "    model.fit(X_train, y_train) # train the model\n",
    "# Compute False postive rate, and True positive rate\n",
    "    fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)[:,1])\n",
    "# Calculate Area under the curve to display on the plot\n",
    "    plt.plot(fpr, tpr, label='%s ROC (area = %0.2f)' % (algo['label'], algo['auc']))\n",
    "# Custom settings for the plot \n",
    "plt.plot([0, 1], [0, 1],'r--')\n",
    "plt.xlim([0.0, 1.0])\n",
    "plt.ylim([0.0, 1.05])\n",
    "plt.xlabel('1-Specificity(False Positive Rate)')\n",
    "plt.ylabel('Sensitivity(True Positive Rate)')\n",
    "plt.title('Receiver Operating Characteristic')\n",
    "plt.legend(loc=\"lower right\")\n",
    "plt.savefig(\"auc.png\")\n",
    "plt.show() "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
